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A  multi-level  maintenance  tier  testability  evaluation  model  is  developed.  This 
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of  effectiveness  of  the  performance  of  the  multi-level  testability  system  taking 
into  account  the  imperfections  of  the  diagnostic  system  (false  alarms.  Can  Not 
Duplicate,  and  Retest  Okay). 
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SUMMARY 


In  recent  years  automatic  fault  detection/fault  isolation  (FD/FI)  systems  are 
widely  used  as  maintenance  tools  for  electronic  equipment/sys terns .  However,  the 
operational  experience  with  FD/FI  systems  has  not  been  good  because  of  lacking 
effective  operational  testing  measures  that  can  express  the  adequacy  of  the  diag¬ 
nostic  system.  The  objective  of  this  study  was  to  investigate  performance  models 
for  the  analysis  of  testabilities  of  Avionics  Systems. 

In  general,  the  problem  with  all  known  diagnostic  measures  can  be  traced  to 
inadequate  and  ambigious  definitions  of  terms,  parameters,  and  their  meanings.  In 
addition,  most  parameters  are  defined  and  determined  as  if  the  levels  of  mainte¬ 
nance  have  nothing  to  do  with  these  parameters,  which  is  not  always  true.  There¬ 
fore,  single  and  multi-level  diagnostic  systems  are  represented  by  decision  trees 
where  testability  parameters  are  accurately  and  unambiguously  defined  at  each 
level.  Accordingly,  a  multimaintenance  tier  testability  evaluation  model  which 
contains  all  levels  of  testability  parameters  at  the  organizational,  intermediate, 
and  depot  levels  is  developed.  In  this  model  three  measures  of  effectiveness  of 
the  performance  of  the  multi-level  testability  systems  are  developed,  and  analyti¬ 
cal  procedures  to  evaluate  these  measures  are  derived,  taking  into  account  all 
problems  which  may  arise  from  the  implementation  of  automatic  diagnostic  systems. 

The  first  measure  represents  the  occurence  of  intermittent  and  temporary 
faults  as  well  as  the  potential  of  the  test  equipments  to  either  cause  malfunction 
in  the  system  or  not  to  work  properly,  while  the  second  measure  reflects  the 
failure  of  the  testing  system  to  perform  its  major  objective  of  detecting  and 
Isolating  faults  when  they  occur.  The  above  measures  represent  the  accuracy  of 
the  diagnostic  system  and  the  ability  of  test  equipment  at  each  level  to  perform 
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The  third  measure  represents  the  precision  of  the  testability  system  and  the 
ability  of  different  test  equipments  at  the  same  or  upper  levels  to  repeat  the 

same  results  according  to  its  tolerances  and  precisions.  This  measure  covers 
mainly  Can  Not  Duplicate  and  Retest  Okay  at  different  levels. 

Furthermore,  new  optimization  procedures  have  been  developed  to  aid  in  the 
evaluation  of  reliability,  maintainability,  and  availability  of  the  system.  In 
addition,  all  costs  associated  with  the  errors  of  the  diagnostic  system  are 

developed  and  modeled  to  express  the  effectiveness  of  the  diagnostic  system. 

These  costs  are  also  used  to  predict  the  life  cycle  cost  for  the  equipment/system, 
taking  into  account  the  actual  performance  of  the  diagnostic  system  and  the 

resulting  consequences  of  its  imperfections. 
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1 .  INTRODUCTION 


1 . 1  Background 

In  recent  years  the  development  and  use  of  automatic  diagnostic  systems  as 
maintenance  tools  for  electronic  equipment/system  has  increased  significantly. 
The  available  advanced  technologies  in  electronics  allowed  the  development  of 
ever-increasingly  complex  systems  and  necessitated  the  development  of  modular 
diagnostic  concept.  Consequently,  automation  was  introduced  into  the  fault 
diagnostic  process  at  the  system,  subsystem  and  equipment  levels. 

The  incorporation  of  automatic  fault  detection/fault  isolation  (FD/FI) 
systems  which  uses  Built-in-Test  (BIT)  and/or  External  Test  Equipment  (ETE)  can  be 
a  significant  aid  to  system  maintainability  and  system  availability  through  the 
automatic  detection  and  isolation  of  malfunctions  without  having  to  resort  to 
time-consuming  manual  troubleshooting  techniques.  Furthermore,  the  manpower  and 
training  necessary  to  support  complex  system  can  be  reduced. 

However,  advances  in  electronic  technology  have  overpaced  the  technology  of 
efficient  and  effective  fault  diagnostic  design.  Few  new  procedures  or  techniques 
have  been  developed  to  aid  in  the  design  of  cost  effective  automated  fault  diag¬ 
nostic  systems  which  include  B1T/ETE  systems  as  part  of  a  comprehensive  multi¬ 
level  maintenance  plan.  In  addition,  the  operational  test  and  evaluation  experi¬ 
ence  with  FD/FI  systems  has,  unfortunately,  not  been  good,  lacking  an  effective 
operational  testing  methodology.  For  this  reason,  it  was  virtually  impossible  to 
accurately  assess  a  system's  real  diagnostic  capability,  let  alone  its  contribi- 
tion  to  overall  system  availability. 

Furthermore,  the  implementation  of  automatic  diagnostic  system  can  produce 
three  types  of  problems:  false  alarms,  could  not  duplicate  (CNDs),  and  retest 
OKays  (RTOKs).  When  the  diagnostics  indicate  a  failure,  but  no  system  degradation 
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is  apparent  to  the  operator,  the  event  is  called  false  alarm.  Such  failure 
indications  are  thought  to  be  caused  by  momentary  excursions  of  the  system  outside 
its  set  parameters.  The  major  impact  of  false  alarm  events  is  a  reduction  of 

operator  confidence  in  the  diagnostics,  and  possibly  unnecessary  Isolation  of  good 
units  and  introducing  them  to  the  repair  cycle  with  all  consequences.  When 

subsequent  maintenance  investigation  fails  to  duplicate  the  condition  for  which  a 
system  has  written  up,  the  event  is  a  CND.  CNDs  may  be  caused  by  intermittent 

failure  and  they  result  in  the  expenditure  of  resources  without  valid  system 
repair.  A  RTOK  is  a  malfunction  which,  when  detected  and  isolated  by  the  auto¬ 
matic  diagnostics  at  one  level  of  maintenance,  is  not  detectable  at  a  higher 

level.  A  possible  cause  of  RTOK  events  is  a  lack,  of  vertical  testability. 

Because  of  the  significant  effect  of  false  alarm,  CNDs,  and  RTOKs  events  in  life 

cycle  cost,  system  maintainability  and  availability,  all  these  events  should  be 
carefully  defined,  studied  and  included  in  any  study  of  automatic  system  diag¬ 
nostics.  In  addition,  strategies  must  be  developed  to  minimize  these  events. 

A  nulti-level  maintenance  system  consists  of  three  levels.  The  lowest  level 
is  organizational,  where  a  faulty  system  is  tested  to  isolate  the  line  replaceable 
unit  (LRU)  that  include  the  faulty  module.  This  LRU  is  removed  from  the  system, 
and  a  spare  substituted  so  the  system  may  resume  operation.  The  faulty  LRU  is 
sent  ot  the  shop  level  where  the  faulty  module  or  shop  replaceable  unit  is 
isolated  and  replaced.  The  LRU  is  then  returned  to  the  organizational  level  for 

standby.  The  faulty  module  may  be  sent  to  the  depot  for  repair  or  may  be  dis¬ 

carded,  based  on  cost  of  repair  versus  replacement. 

1.2  Related  Research 

A  review  of  the  technical  literature  shows  that  most  of  the  work  in  the  field 


of  testability  dealt  with  very  special  problems,  mainly  in  the  area  of  design  of 


diagnostics,  evaluation  and  assessment  of  diagnostic  system,  and  cost  design 


characteristics  and  design  guideline.,  for  testing  systems.  In  most  of  the 
references,  only  certain  testability  parameters  were  considered  and  defined  to  fit 
a  very  special  problem  without  any  effort  to  relate  such  parameters  to  the  entire 
composite  diagnostic  system,  which  includes  organizational,  shop  and  the  depot 
levels. 

As  for  the  problem  of  designing  optimum  testing  procedures,  the  search 
started  in  the  late  fifties  when  Gluss  (1959)  tackled  the  problem  of  having  a 
fault  developed  in  a  system  consisting  of  n  modules  where  each  one  contains 
several  components.  He  presents  two  mathematical  models  to  dictate  search 
strategies  that  will  minimize  a  stipulated  cost  function.  Firstman  and  Gluss 
(1960)  extend  Gluss 's  work  by  considering  different  ways  to  estimate  the 
probabilities  of  faults  lying  in  respective  modules.  A  related  work  by  Johnson  et 
al.  (1960)  discusses  the  generation  of  efficient  sequential  tests  procedures  by 
using  information  theoretic  methods  to  evaluate  the  amount  of  information  provided 
by  a  test.  Kletsky  (1960)  demonstrates  the  validity  of  the  information  theory 
approach  by  studying  a  standard  communication  receiver,  then  he  proposes  a 
diagnostic  procedure  to  test  it.  He  reports  that  this  method  can  be  adapted  to 
provide  diagnostic  procedures  appropriate  to  almost  any  level  of  maintenance 
(organizatyion,  shop,  or  depot).  Winter  (1960)  demonstrates  the  validity  of  the 
information  theory  approach  by  studying  a  standard  communication  receiver,  then  he 
proposes  a  diagnostic  procedure  to  test  it.  He  reports  that  this  method  can  be 
adapted  to  provide  diagnostic  procedures  appropriate  to  almost  any  level  of 
maintenance  (organization,  shop,  or  depot).  Winter  (1960)  derives  necessary 
conditions  in  order  to  find  an  optimal  testing  sequence  by  successive  permutations 
of  adjoining  units  using  conditional  probabilities  and  statistical  analysis. 


Chang  (1968)  introduces  the  distinguishability  criterion  for  computing  the 
figure  of  merits  of  tests  and  accordingly  derive  efficient  testing  procedures. 
Cohn  and  Ott  (1971)  present  a  recursive  algorithm  based  on  the  concept  of  dynamic 
programming  to  specify  an  adaptive  testing  procedure  that  detects  a  failure  and 
isolates  the  faulty  component  while  minimizing  the  expected  cost  of  testing. 
Butterworth  (1972)  considers  the  system  which  works  if  K  or  more  of  its  N  compo¬ 
nents  work.  He  develops  a  methematical  model  to  derive  several  rules  for  finding 
the  optimal  sequential  policies  for  series  and  parallel  systems  for  independent 
LRUs.  Halpem  (1974)  presents  a  heuristic  simple  adaptive  sequential  testing 
procedure  for  the  K-out-of-N  system  with  equal  cost  of  all  tests.  Pieper  et  al. 
(1974)  develop  a  step-by-step  computerized  procedure  for  generating  complete 
troubleshooting  trees  which  will  identify  the  ssytem's  functional  unit  which  is 
causing  observable  system  malfunction  indications.  Sheskin  (1977,  1979)  develops 
a  probabilistic  dynamic  programming  procedure  to  determine  the  sequence  of  tests 
to  isolate  the  group  of  modules  which  contains  the  faulty  unit.  He  also  presents 
a  hybrid  dynamic  programming  algorithm  to  determine  the  optimum  partition  of  the 
equipment  and  the  set  of  tests  which  should  be  executed  by  BIT  in  order  to  produce 
this  partition. 

Aly  (1979)  presents  a  Branch-and -Bound  algorithm  to  solve  the  problem  of  the 
optimum  design  diagnostics  (fault  detection  and  isolation).  Although,  no  computa¬ 
tional  experience  is  provided,  the  algorithm  has  a  great  tendency  to  reduce  both 
the  computations  and  storage  burden  in  comparison  and  storage  burden  in  comparison 
to  the  ones  employing  dynamic  programming.  Aly  (1980)  develops  several  dominance 
and  reduction  rules  which  improve  the  performance  of  his  branch-and-bound  algor¬ 
ithm.  Aly  and  Elsayedaly  (1981)  provide  a  comprehensive  computational  results  for 
the  branch-and-bound  algorithm  and  also  show  its  superiority  over  other  methods 
developed  based  on  dynamic  programming. 


The  literature  rarely  addresses  the  problem  of  evaluating  and  assessing 
diagnostic  system.  Emphasis  on  such  works  is  to  find  a  valid  and  reliable  proce¬ 
dure  to  check  the  effectiveness  of  BIT/TE  system,  or  to  evaluate  FD/F1  systems. 
Poliska  et  al.  (1979)  studies  a  diagnostic  system  which  consists  of  BIT  and/or 
external  test  equipment  in  order  to  determine  the  measures  and  figures  of  merit 
that  are  required  to  determine  the  adequacy  of  the  system.  Simple  mathematical 
models  are  used  to  evaluate  the  figure  of  merits  using  the  scoring  factor 
weights.  Conley  (1980)  presents  a  Failure  Modes  and  Effects  Analysis  (FMEA) 
procedure  to  be  used  on  a  complex  digital  data  system  where  the  FD/F1  is  specified 
for  the  system.  Tuttle  and  Loveless  (1980)  study  the  reliability  of  the  BIT/ETE 
system  as  a  function  of  the  complexity,  physical  characteristics,  and  functional 
characteristics  of  the  BIT/ETE  used  in  support  of  a  system.  They  also  study  the 
impact  on  the  operation  of  the  prime  equipment  due  to  the  failure  modes  of  BIT/ETE 
using  correlation  analysis.  Horkovich  (1981)  discusses  the  importance  of 
developing  an  efficient  methodology  to  evaluate  fault  detection/fault  isolation 
systems  taking  into  consideration  the  overall  system  Mean-Time-To-Repair  (MTTR) , 
CND,  and  RTOK  rates.  Linden  (1981)  studies  the  effectiveness  of  BIT/ETE  and 
discusses  approaches/trends  towards  highly  automated  diagnostics.  False  alarms, 
CNDs,  and  RTOKs  are  explained  and  their  role  in  determining  the  effectiveness  of 
BIT/ETE  systems  and  the  implication  of  the  CNDs,  RTOKs,  and  false  alarms  which  are 
inherent  in  such  systems.  He  uses  the  expected  number  of  removals  that  occur  per 
single  prime  system  failure  as  a  measure  of  effectiveness  of  the  system  and  how 
effectively  the  associated  test  equipment  is  performing  its  designated  job  of 
fault  detection  and  isolation.  Aly  and  Bredeson  (1983)  discuss  many  aspects  of 
diagnostic  procedures  and  checked  some  predictions  parameters  for  their  effect  on 
a  reliable  system. 


V  V  . 


In  the  area  of  cost  characteristics  and  design  guidelines  for  testing  systems 
Gaertner  (1974)  describes  the  design  of  the  BIT  circuitry  for  tactical  FM  radios 
considering  functional  and  physical  characteristics  of  the  BIT  system.  Levy  et 
al.  (1976)  study  test  procedures  and  specifications  during  the  depot  repair 
cycle.  They  develop  a  method  for  identifying  key  maintenance  decisions  and  opti¬ 
mizing  tests  and  test  decisions  in  order  to  minimize  support  costs.  Biegel  and 
Bulcha  (1978,  1979)  study  the  multilevel  modularization/partitioning  of  large 

electronic  networks  subject  to  physical  MTTR  and  availability  constraints  in  order 
to  minimize  the  life  cycle  cost.  They  develop  a  generalized  procedure  that  is 
capable  of  doing  any  number  of  levels  of  modularization.  Bogard  (1980)  studies 
the  logistic  support  cost  characteristics  of  BIT/ETE  in  order  to  develop  guide¬ 
lines  and  relationships  for  use  in  the  development  phase  of  an  Air  Force  elec¬ 
tronic  equipment  program  to  estimate  operation  and  support  costs  associated  with 
various  types  of  testers  and  test  subsystems.  Heckelman  et  al.  (1981)  investi¬ 
gates  the  effects  of  architecture,  functional  partitioning,  and  module  and  compo¬ 
nent  features  on  micro-programmable  self -diagnosing  capabilities  of  digitel  pro¬ 
cessors.  These  results  are  then  used  to  create  a  set  of  design  guidelines  for 
designing  self -diagnosing ,  fault-tolerant,  highly  reliable  microprocessors,  namely 
monolithic  and  bit-slice  processors  using  LSI  devices.  Aly  and  McDonald  (1983) 
develop  a  minimum  expected  cost  diagnostic  procedure  based  on  the  combined  costs 
of  packaging  and  testing  at  the  organizational  and  intermediate  levels. 

From  the  above  survey,  non  of  the  references  address  the  optimization  of  the 
entire  multi-level  system,  taking  into  account  the  effectiveness  and  reliability 
of  organizational  built-in-test/automatic  test  equipment,  shop  test,  and  depot 
test  as  a  function  of  the  physical  and  functional  characteristics  of  these  tests 
as  well  as  the  overall  fault  detection/fault  isolation  (FD/FI)  of  the  system. 
Even  though  statistical  methods  are  utilized  for  some  problems,  very  few  of  them 


consider  a  realistic  life  cycle  cost  of  the  system  which  takes  into  account  the 
penalties  and  costs  associated  with  all  errors  of  the  diagnostic  system  at  all 
levels  of  repair.  Also,  all  figures  of  merit  are  inconsistent  to  be  effective  in 
the  design  of  the  prime  system  as  a  result  of  the  ambiguities,  and  differences  in 
interpretation  of  different  testability  parameters. 

2.  TESTABILITY  PARAMETERS 

In  this  section,  problems  and  critiques  of  testability  parameters  are 
presented  and  discussed.  Then,  a  general  testability  model  for  any  level  of 
repair  is  used  to  define  more  accurate  and  unambiguous  testability  parameters. 
Accordingly,  measures  of  effectiveness  of  the  multi-level  testability  systems  are 
developed  and  analytical  procedures  to  evaluate  these  measures  are  derived. 

2.1  Critique  of  Testability  Parameters 

In  general,  the  problem  with  all  known  testability  parameters  can  be  traced 
to  inadequate  and  ambiguous  definitions  of  terms,  parameters  and  their  meanings. 
Take  for  example  the  three  definitions  for  Fraction  of  Faults  Detected  (FFD).  In 
particular  consider  the  terms:  QgDF  (quantity  of  faults  detected  by  BIT/ETE),  Qp^ 
(quantity  of  faults  detected)  and  Qyup  (quantity  of  faults  detected  through  use  of 
defined  means).  Qg^p,  QyDF’  ant*  Pfd  ^ave  *n  8ome  instances  been  calculated  taking 
into  account  only  detections  caused  by  actual  faults.  In  addition,  when  we  define 
QBDf>  %d*  an<*  PyDF  we  mean  aH  possible  faults  or  the  faults  which  will  occur 
over  a  period  of  system  operating  life  (in  accord  with  failure  rates)? 

Furthermore,  it  is  observed  that  most  parameters  are  defined  and  determined 
as  if  the  levels  of  maintenance  have  nothing  to  do  with  them  and  their  values, 
which  is  not  true  since  test  tolerances  are  different  at  different  levels. 


2.2  General  Testability  fodel  for  Any  Level  of  Repair 

At  any  level  of  repair  l  (organizational,  intermediate,  or  depot)  the 
testability  system  can  be  modeled  as  shown  in  Figure  2.1.  A  diagnostic  group, 
which  is  to  be  tested  by  the  available  test  equipment  at  this  level,  contains  n£ 
replaceable  units  (RU),  with  each  RU^  containing  m^  sub-replaceable  units  (SU). 


Figure  2.1  A  Diagnostic  Group 


This  model  is  general  enough  to  accomodate  the  testability  systems  at  any 
level.  At  the  organizational  level,  the  diagnostic  group  is  the  prime  equipment, 
the  replaceable  unit  is  the  Line  Replaceable  Unit  (LRU),  the  subreplaceable  unit 
is  the  module,  and  the  test  equipment  is  the  BIT/ATE  system.  Also,  nQ  -  N  where  N 
is  the  number  of  LRU's  in  the  prime  equipment. 

At  the  intermediate  level,  the  diagnostic  group  is  the  LRU  which  has  been 
isolated  at  the  organizational  level,  say  LRU^  the  replaceable  unit  is  the 


module,  the  subreplaceable  unit  Is  the  component  or  part  and  the  test  equipment 
used  at  this  level  is  the  external  test  equipment  (ETE).  Also,  n^  ■  ,  where  Mj 
is  the  number  of  modulas  in  the  LRU^. 

At  the  depot,  the  diagnostic  group  is  the  module  which  has  been  isolated  at 
the  intermediate  level,  say  module  k,  the  replaceable  unit  is  the  component  or 
unit  and  the  test  equipment  is  the  manual  or  semiautomatic  test  equipment  at  the 
depot.  Furthermore,  nD  "  Uki*  where  Uki  is  the  number  of  components  in  module  k 
of  LRU1« 

2.3  General  Testability  Procedures  at  Any  Level  of  Repair 

A  general  testability  procedure  at  any  level  1  (organizational,  intermediate, 
depot)  is  depicted  in  Figure  2.2  where  the  diagnostic  system  can  be  in  one  of  the 
following  states: 

a.  Successful  Performance 

The  diagnostic  system  correctly  detects  and  Isolates  the  faulty  RU  if  a 
fault  exists,  or  the  diagnostic  system  reports  no  failure  when  the  diag¬ 
nostic  group  is  fault  free. 

b.  Failure  to  Report 

A  faulty  diagnostic  group  is  introduced  at  level  1.  However,  test  equip¬ 
ment  could  not  report  or  verify  the  failure. 

c.  Failure  to  Isolate  RU 

A  faulty  diagnostic  group  is  introduced  to  level  £,  where  the  failure  is 
verified.  However,  the  test  equipment  fails  to  Isolate  the  faulty  RU  and 
reports  no  failure  instead. 


d.  False  RU  Isolation 


A  faulty  diagnostic  group  is  introduced  to  level  A  where  test  equipment 
verified  its  failure.  However,  it  isolates  a  good  RU  instead  of  the 
faulty  one. 

e.  False  Report 

A  good  diagnostic  group  is  introduced  to  level  A.  However,  test  equip¬ 
ment  mistakenly  reported  a  failure  (false  alarm)  and  isolated  a  good  RU. 

f.  Can  Not  Duplicate  at  level  l  (CND^) 

A  god  diagostic  group  is  introduced  to  level  A  where  test  equipment 
reports  a  failure  (false  alarm).  However,  in  the  isolation  process  no 
faulty  RU  is  found. 


2.3.1  Definitions 


Let 


N  *  number  of  LRU's  in  the  prime  equipment 
Mj  *  number  of  modules  in  LRUj 
Uki  ”  number  of  components  or  units  in  module  k  of  LRUj 

organizational 

A  *  level  of  repair,  A  *  ^1  intermediate 

depot 


n.  *  number  of  replaceable  units  (RU)  in  the  diagnostic  group  at  level  A 
■  failure  rate  of  RUj^  at  level  A 

P(af  )^  ■  proportion  of  all  possible  faults  in  RU^  (i*l , . . . ,n^) ,  which  are 

addressable  by  the  test  equipments  at  level  A 


]?0*i^A  “  proportion  of  all  addressable  faults  in  R^  which  can  be  detected  by 
the  test  equipments  at  level  A 

P(FI  ) .  »  probability  that  the  test  equipments  at  level  A  will  correctly 
1  isolate  the  failure  to  i  or  less  RU  after  detecting  the  failure, 
given  that  the  diagnostic  group  at  level  A  is  actually  faulty 

P(MB).  ■  probability  that  any  good  RU  at  level  A  will  be  mistakenly  isolated 
1  by  the  test  equipments,  given  that  the  diagnostic  group  at  this  level 
is  actually  faulty 


P(FA)^  *  probability  that  the  test  equipments  at  level  l  report  a  failure 
given  that  the  diagnostic  group  is  not  faulty 

P(MG)g  ■  probability  that  any  good  RU  at  level  1  is  mistakenly  isolated  by  the 
test  equipment  after  the  occurrence  of  a  false  alarm  at  this  level 

2.3.2  Failure  of  Diagnostic  Groups 

Let  P(F)0  be  the  probability  of  equipment  failure  (at  the  organizational  level), 
P(F)j  be  the  probability  of  introducing  a  faulty  LRU  to  the  intermediate  level, 

and  (F)d  ^  t^e  probability  of  Introducing  a  faulty  module  to  the  depot. 

Assuming  that  only  one  faulty  unit  can  exist  within  the  prime  equipment 
undergoing  test.  Let  P(f^)0  be  the  probability  of  failure  of  LRU;  at  the 
organizational  level  in  operating/mission  time  t,  then 

P(F)q  •  1  -  Prob.  [equipment  is  good]  (2.1) 

but,  Prob.  [equipment  is  good  ]  »  Prob.  [LRUj  is  good  and  LRU2  is  good...  and  LRU^ 

is  good] 

-  P[LRUX  is  good]  •  P[LRU2  is  good]  ...  P[LRUN  is 
good] 

-  [1-P(f1)0]  •  [1-P(f2)0]  ...  [l-P(fN)Q] 

■  ih  |l-p(fi>ol 


substituting  in  equation  2.1,  then 


p(F)o  -  1  -  nml  li-p(fi)0l 


(2.2) 


But  since  after  initial  wear-in,  when  the  occurrences  of  failures  are  essen¬ 


tially  random,  electronic  LRU's  and  modules  often  demonstrate  failure  character¬ 
istics  that  are  described  by  the  negative  exponential  distribution.  Then, 


w*  • 
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and  t  »  operadng/mlssion  time  of  any  LRU^  substituting  in  equation  2.2,  then, 


P(F)0  -  1  -  1S1  etX<>1 


and  from  figure  2.2 

p(FVp(FD)o-p(FIi>o 

P(F,I  "  P(F)0-P(FD)0*P(FIi)0+P(F)0*P(FD)0*P(FLi)0+[l-P(F)0)  P(FA)Q- P(  WI^ 

p(f)t*p(fd)  »  P(FI  ) 

PCF'v  «  _ i _ 1  11 _ 

V  'D  P(F)I*P(FD)I*P(FIi)I+P(F)I-P(FD)I*P(FLi)I+[l-P(F)I)P(FA)I*P(WI1)1 

2.3.3  Actual  Fault  Detection  Capability  of  the  Test  Equipment 

The  actual  fault  detection  capability  of  the  test  equipment  at  any  level  l 
should  consider  both  faults  which  are  addressable  and  those  which  are  not 
addressable  by  the  test  equipment.  In  addition,  all  addressable  faults  should  be 
specified  to  those  faults  which  can  occur  during  the  diagnostic  group  operating 
life.  Let  PCaf^)^  be  the  proportion  of  all  possible  faults  in  the  Rl^  at 
level  1  which  are  addressable  by  the  test  equipment,  PCd^)^  be  the  proportion  of 
all  addressable  faults  in  RU^  which  can  be  detected  by  the  test  equipment  at 
level  1,  and  P^d^)^  be  the  probability  [test  equipments  will  detect  a  fault  in 
RU^  at  level  l  jRU^  is  faulty].  Then, 

■  r«ih  p<»£1>t 

Also  let  P(RUjJf)^  be  the  probability  that  a  failure  is  in  RU^  at  level  i  given 
that  a  failure  exists,  then 


P(RUi  F)^  •  P[RU^  is  faulty  at  level  l (diagnostic  group  is  faulty] 


where  is  the  failure  rate  of  RUA  at  level  Z,  hence 

X^  *  total  failure  rate  of  all  SU  in  Rl^  at  level  Z 

since , 

P(FD)^  =  prob.  [test  equipment  will  detect  a  fault  at  level  £ (diagnostic  group 
is  faulty] 

then, 

P(FD)t  -  ?*  PCRuJf),  nti  )t 

i«l 

In  this  case  P(FD)^  is  the  actual  detection  capability  of  the  test  equipment 
at  level  Z  when  a  fault  exists  in  the  diagnostic  group  introduced  to  this  level. 

2.3.4  Actual  Fault  Isolation  Capability  of  the  Test  Equipments 

a .  Correct  Isolation 

The  correct  isolation  capability  of  a  diagnostic  system  can  be  defined 
by  PCFI^^  where 

P(FI^)^  *  prob.  [failure  will  be  isolated  to  i  or  less  RU  at 
level  Z  ja  fault  is  detected  and  the  diagnostic  group  is 
faulty ] 

b.  Misassignment 

Assuming  that  any  good  RU  at  any  level  has  the  same  chance  to  be 

mistakenly  Isolated  (misassignment) ,  let 

P(MB)^  ■  prob.  [any  good  RU  at  level  Z  will  be  mistakenly 
Isolated  diagnostic  group  is  faulty] 

Since  misassignment  can  occur  independently  of  each  other  in  the  n^-1 
good  RU  at  level  Z ,  then  the  probability  of  i  misassignments  is  a 
discrete  random  variable  with  a  binomial  probability  distribution  such 


P(FL^)^  =  probability  [i  or  less  good  RU  will  be  mistakenly  isolated 
at  level  fc|a  fault  is  detected  and  the  diagnostic  group  is 
faluty],  i*0 


then, 


E  (P(MB),)k  (1-P(MB)£>”* 

k-1 


When  the  specification  combines  'he  isolation  of  good  RU's  and  no 
isolation  of  any  RU  together  then 

P<PV«  ‘  l-p(pV* 

c.  No  Isolation 

Probability  of  no  isolation  of  any  RU  good  or  faulty /diagnostic 
group  is  faulty  at  level  Z,  P(NI)^,  can  be  computed  using  PCFI^^ 
and  P(FL1)Jl  where, 

P(NI>£  -  1  -  P<FLl)t  -  P(FIt)t 


2.3.5  False  Alarm 

False  alarm  can  be  measured  by  P(FA)^  where 

P(FA)^  *  probability  [test  equipment  detects  a  failure  at  level  Z  | diagnostic 
group  is  good]  <■  prob.  [false  alarm  at  level  !] 

As  a  result  of  the  false  alarm,  false  Isolation  can  occur  which  can  be  measured 

by  P(WI  )A  and  P(MG)^  where 

PCWI^)^  *  probability  [i  or  less  good  RU's  will  be  isolated  at  level  i|a  fault 
is  detected  and  the  diagnostic  group  is  good],  i  >  1 

P(MG)^  ■  prob.  [any  good  RU  will  be  mistakenly  isolated  at  level  Z  | diagnostic 
group  is  good] 

Since  false  isolation  can  occur  independently  of  each  of  the  n^  good  RU's  at 
level  Z,  then  the  probability  of  the  false  isolation  of  i  RU's  is  a  discrete 
random  variable  with  a  binomial  probability  distribution  such  that 


in.  ,  n  -k 

,  -  Z  <v  >  (P(HG).r  (l-P(MG),)  ,  i  >  1 


P(WI. ) 


General  Testability  Procedures 


The  CND  at  level  i  can  be  measured  by  P(CND)^,  where 

P(CND).  *  Prob.  {isolating  no  RU  at  level  i I  false  alarm] 

n*  0  n* 

-  (q  )  <P(MG)£)U  (l-P(MG)t) 

-  ( 1-P(MG)£ )  1 

2.4  Measures  of  Multi-Level  Diagnostic  System  Effectiveness 

Since  the  existence  of  many  different  parameters  leads  to  problems  in  system 
optimization,  it  would  seem  to  be  desirable  to  be  able  to  logically  group  more 
than  a  parameter  into  a  single  measure.  Several  attempts  at  this  have  been 
tried.  All  have  been  less  than  entirely  valid  from  a  mathematical/engineering 
standpoint.  For  example,  many  automatic  fault  detection/fault  isolation  systems 
use  only  one  figure  of  merit  FD/F1,  as  an  indication  of  the  diagnostic  system 
capability.  For  example,  90%/80%  means  90%  of  those  malfunctions  addressable  by 
the  FD/FI  capability  are  detected  and  of  those  detected  80%  are  Isolated.  Since 
the  percentage  of  faults  detected  and  isolated  are  considered  independent,  it  can 
be  concluded  that  72%  of  the  addressable  functions  can  be  isolated.  This  figure 
is  misleading  since  it  disregards  the  undetected  faults,  it  ignores  the  possi¬ 
bility  that  fault  detection  is  not  necessarily  independent  of  fault  isolation,  and 
it  is  ambiguous  with  respect  to  hew  false  alarms,  false  isolation,  and  CND  are  to 
be  interpreted. 

In  addition,  automatic  detection  and  isolation  equipment  in  the  form  of 
built-in-test  equipment  and  removable/replaceable  modules  were  primarily  intro¬ 
duced  to  sophisticated  weapon  systems  in  order  to  improve  and  support  the  avail¬ 
ability  and  maintainability  of  these  systems,  decrease  the  maintenance  burden  and 
provide  an  alternative  to  the  rising  costs  of  training,  high  personnel  turnover, 
and  the  increase  in  resources  necessary  for  system  support.  However,  the 


experience  with  those  diagnostic  systems  has  not  lived  up  to  expectation,  lacking 
an  effective  operational  testing  methodology  which  can  accurately  assess  the 
system's  real  diagnostic  capability,  and  its  contribution  to  overall  weapon  system 
availability. 

Furthermore,  when  selecting  a  measure  of  effectiveness  we  should  keep  in  mind 
that  the  measure  will  have  little  value  without  certain  essential  characteristics. 
Probably  the  most  important  characteristic  is  that  the  measure  be  expressed 
quantitatively.  We  should  be  able  to  reduce  it  to  a  number  such  that  comparisons 
between  alternative  designs  can  be  made.  Further,  the  measure  we  choose  must  have 
a  basis  in  physical  reality.  Thus,  it  should  be  descriptive  of  the  real  problem, 
neither  exaggerated  nor  over-simplified.  Yet  at  the  same  time  the  measure  should 
be  simple  enough  to  allow  for  mathematical  manipulation. 

In  this  section,  three  measures  of  effectiveness  of  the  multi-level  diag¬ 
nostic  system  are  presented.  They  are  derived  from  the  actual  system  requirements 
in  order  to  accurately  represent  the  system's  real  diagnostic  capability.  They 
are  called  a,  B,  and  Y  errors. 

a.  False  Removal  (a  error) 

At  any  level,  if  the  diagnostic  group  is  not  faulty,  then  the 

diagnostic  system  should  not  report  or  isolate  any  good  RU.  If  it  does 
report/isolate  a  good  RU  where  no  failure  exists  in  the  diagnostic  group, 
the  diagnostic  system  commits  a  error.  This  error  represents  the 

occurence  of  intermittent  and  temporary  faults  as  well  as  the  potential 
of  the  test  equipment  either  to  cause  malfunction  in  the  diagnostic  group 
or  to  work  improperly. 

b.  Failure  to  Diagnose  (0  error) 

At  any  level,  if  the  diagnostic  group  is  faulty,  then  the  main 

objective  of  the  diagnostic  system  is  to  detect  and  isolate  the  faulty 


RU.  So,  if  a  fault  occurs  and  the  test  equipment  fails  to  report  or 
isolate  the  faulty  RU  or  it  isolates  a  good  RU  instead,  then  3  error  is 
committed.  This  error  reflects  and  represents  the  failure  of  the  testing 
system  to  perform  its  major  objective  of  detecting  and  isolating  faults 
when  they  occur, 
c.  Lack  of  Precision  (y  error) 

If  the  diagnostic  group  at  any  level  is  not  faulty,  the  test 
equipment  at  this  level  should  report  no  failure  or  isolate  no  good  RU  — 
a  correct  action.  However,  if  that  occurs  after  mistakingly  reporting  a 
failure  or  isolating  an  RU  either  at  the  same  level  or  at  any  lower 
levels,  then  Y  error  is  committed.  Simply,  this  error  is  the  CND's  and 
RTOK's  at  different  levels. 

a  and  3  errors  represent  the  accuracy  of  the  diagnostic  system  and  the 
ability  of  test  equipment  at  each  level  to  perform  accurately  according  to 
specifications  without  errors. 

A  Y  error  represents  the  precision  of  the  testability  system  and  the  ability 
of  different  test  equipment  at  the  same  level  or  different  levels  to  repeat  the 
same  results  according  to  its  tolerance  and  precision. 

2.5  Testability  Effectiveness  Measures  at  level  1 

a,  3  and  Y  errors  at  any  level  £  (organizational,  intermediate,  or  depot)  can 

be  developed  using  the  decision  tree  in  Figure  2.2  as  follows: 

»  P[false  RU  detection  and/or  isolation  at  level  £ (diagnostic  group  is  good] 

3^  *  P[failure  to  detect  and/or  isolate  the  faulty  RU  at  level  £ (diagnostic 

group  is  faulty] 

Y ^  *  P[correct  action  of  not  isolating  a  good  RU  at  level  £  after  reporting  its 

failure | diagnostic  group  is  good] 


P[successful  performance  of  the  test  equipment  at  level  i] 


where , 

a£  *  P[false  RU  isolation  (diagnostic  group  is  good] 

-  P(FA)£  [l-P(F)Jl ] 

S£  ■  P[failure  to  report]  +  P[failure  to  isolate  RU]  +  P[false  isolation] 

-  p(f)£  -  [p(f)£  p(fd)4]  +  [p(f)£  p(fd)£  -  p(f)£  p(fd>£  p(fl±)£  "  P(F)£ 

P(FD)£  PCFI^]  +  P(F)£  P(FD)£  PCFL^ 

=  P(F)£  -  P(F)£  P(FD)£  P(FIi)£ 

*  P(F)£ [ 1-P(FD)£  p(fi±)£ ] 
y£  »  cnd£ 

-  [1-P(F)£  ]  P(FA)£  -  [  1-P(F)£  ]  P(FA)£  P(WI±)£ 

-  [ 1-P(F)£ ]  p(fa)£  [l-P(WIi)£] 

<S£  =  P[ isolating  the  faulty  Ru| diagnostic  group  is  faulty] 

+P [report  no  failure (diagnostic  group  is  good] 

-  1-P(F)£  -  [1-P(F)£]  p(fa)£  +  P(F)£  p(fd)£  PCF^^ 

=  [1-P(F)£]  [1-P(FA)£]  +  P(F)£  P(FD)£  P(FIi)£ 

2.5.1  Special  Case 

When  the  depot  repair  cycle  is  perfect,  as  in  Figure  2.3,  then  for  every 
faulty  module,  the  faulty  component  will  be  isolated  and  for  every  good  module,  no 
isolation  results.  Also,  the  above  formulas  will  remain  the  same  except  that 

PCWL^d  -  0  and  P^I^p  -  1 
hence , 

°D-  ° 

eD  *  p(f)d«  [i-p(fd)dj 
y D  -  U-p(f)d]’P(fa)d 

6n  -  P(F)  • P(FD)  +  [l-P(F)  ]•  [l-P(FA)n] 


2.6  Testability  Procedures  for  the  Organizational/intermediate  System 

If  the  major  concern  is  with  the  testability  at  the  organizational  and 
Intermediate  levels  as  if  they  are  one  unit,  then  the  testability  system  can  be 
presented  as  in  Figure  2.4.  The  states  of  thi6  composite  system  are  different 
from  studying  each  level  separately.  Accordingly  new  parameters  are  considered 
and  some  other  parameters  are  redefined  to  fit  the  new  system.  The  diagnostic 
system  can  be  in  one  of  the  following  states: 

a.  Failure  to  report  (organizational) 

The  prime  equipment  is  faulty.  However,  the  BIT/ATE  at  the  organi¬ 
zational  level  reports  no  failure. 

b.  Failure  to  report  (intermediate) 

The  prime  equipment  is  faulty  and  the  faulty  LRU  is  isolated  at  the 
organizational  level.  However,  ETE  at  the  intermediate  level  reports  no 
failure  in  the  isolated  faulty  LRU. 

c.  Failure  to  isolate  LRU 

The  prime  equipment  is  faulty  and  the  BIT/ATE  detects  a  failure  at 
the  organizational  level.  However,  it  fails  to  isolate  any  LRU. 

d.  Failure  to  Isolate  module 

The  prime  equipment  is  faulty  and  the  faulty  LRU  is  isolated  at  the 
organizational  level.  However,  in  the  intermediate  level,  ETE  verifies 
the  LRU  failure  but  fails  to  isolate  any  module. 

e.  Successful  FD/FI  (organizational/intermediate  system) 

Isolating  the  faulty  module  if  the  prime  equipment  is  faulty  or 
reporting  no  failure  if  the  prime  equipment  is  fault  free. 


False  module  Isolation 

The  prime  equipment  is  faulty  and  the  faulty  LRU  is  isolated  at  the 
organizational  level.  However,  in  the  intermediate  level  a  good  module 
is  isolated  instead  of  the  faulty  one. 

False  Report  (intermediate) 

1)  The  prime  equipment  is  faulty,  a  good  LRU  is  isolated  at  the  organi¬ 
zational  level.  In  the  intermediate  level,  ETE  indicate  a  failure  in 
the  isolated  good  LRU  and  Isolate  a  good  module. 

2)  The  prime  equipment  is  good,  and  a  good  LRU  is  isolated  at  the 
organizational  level,  as  a  result  of  a  false  alarm.  In  the  inter¬ 
mediate  level,  ETE  Indicate  a  failure  in  the  isolated  good  LRU  and 
isolate  a  good  module. 

No  Fault  Isolation 

The  prime  equipment  is  faulty,  a  good  LRU  is  isolated  at  the 

organizational  level.  In  the  Intermediate  level,  ETE  indicate  no  failure 
in  the  isolated  good  LRU. 

No  module  Isolation 

The  prime  equipment  is  faulty,  a  good  LRU  is  isolated  at  the  organ¬ 
izational  level.  In  the  intermediate  level,  ETE  indicate  a  failure  in 
the  Isolated  good  LRU  but  it  isolates  no  module. 

Can  Not  Duplicate  (organizational  level)  CND 

The  prime  equipment  is  good,  BIT/ATE  reports  a  failure  at  the 

organizational  level  (false  alarm).  However,  it  isolates  no  LRU. 

Can  Not  Duplicate  (Intermediate  level)  CNDj 

The  prime  equipment  is  good,  a  good  LRU  is  isolated  as  a  result  of 
false  alarm.  In  the  intermediate  level,  ETE  indicates  a  failure  in  the 
Isolated  good  LRU.  However,  it  isolates  no  module. 


1.  Re -Test  OK  (intermediate  level)  RTOK^ 

The  prime  equipment  is  good,  a  good  LRU  is  mistakingly  isolated  as  a 
result  of  false  alarm.  However,  in  the  intermediate  level,  ETE  reports 
no  failure  in  the  isolated  LRU. 


2.7  Testability  Effectiveness  Measures  for  the  Organizational/ Intermediate  System 

a,  0  and  Y  errors  for  the  organizational/intermediate  system  are  developed 


using  the  decision  tree  in  Figure  2.4  as  follows: 

*  P [false  report  and/or  isolation | equipment  is  good] 

3  =  P[ failure  to  correctly  detect  and/or  isolate  the  faulty  unit j equipment  is 

faulty ] 

Y  =  P[correct  action  of  not  isolating  LRU  and/or  module  after  isolating  a  good 
U1  LRU] 

=  P[successful  performance  of  the  diagnostic  system  in  the  organizational  and 
intermediate  as  a  whole] 


where , 

«  P[false  report  (intermediate)] 

-  [l-P(F)0]*P(WIi)0*P(FA)0-P(FA)I-P(WIi)I 

3  -  P[failure  to  report  (org.)]  +  P[failure  to  report  (int.)]  +  P[failure  to 

1  isolate  LRU]  +  P[failure  to  isolate  module]  +  P[false  module  isolation]  + 
P[false  report  (intermediate)]  +  P[no  fault  isolation]  +  P[no  module 
isolation] 


3 


-  [P(F)0  -  P(F)0*P(FD)0]  +  [P(F)0*P(FD)0.P(FIi)0  -  P(F)Q- P(FD)Q- P(FI± )Q 
•P^D)^  +  [P(F)0  +  P(FD)q  -  P(F)o*P(FD)0*P(FIi)0  -  P(F)0*P(FD)0 
•P(FLi)Q]  +  [P(F)0*P(FD)0*P(FI1)0*P(FD)1  -  P(F)Q* P(FD>0* P(FIi)Q* P(FIi>I 
•P(FIi)I  -  P(F)0»P(FD)0*P(FIi)0*P(FD)I*P(FLi)I]  +  [P(F)0*  P(FD)Q*  P(FIi )Q 
•P(FD)I*P(FLi)I]  +  [P(F)0*P(FD)0*P(FLi)0*P(F/  1  P^I^]  +  [P(F)Q 
•P(FD)q*P(FL1)0  -  P(F)0*P(FD)0*P(FL1)0*P(FA)I]  +  [P(F)Q-  P(FD)Q»  P(FLi  )Q 
•P(FA)X  -  P(F)0*P(FD)0*P(FLi)0*P(FA)I*P(WIi)I] 

-  P(F)_  -  P(F)rt»  P(FD)rt*  P(FI . )_• P(FD)_*  P(FI. )T ] 


AT 


Y  -  CNDq  +  CNDj  +  RTOKj 

=  [  [1-P(F)0]  P(FA)0  -  11-P(F)q]  P(FA)0*P(WI1)0J  +  [  [1-P(F)0J  P^I^ 
•P(FA)0*P(FA)I  -  [1-P(F)0]  P(WIi)0*P(FA)0*P(FA)I*P(WIi)I]  +  [  [  1-P(F)0J 
•P(WIi)0*P(FA)0  -  1 1  — P ( F ) 0 ]  P(WIi)0«P(FA)0*P(FA)I] 

Y0I  -  [1-P(F)0]  P(FA)q  -  [1-P(F)0]  P(FA)0«P(WI1)0»P(FA)I*P(WIi)1 
«  P[ isolating  the  faulty  module | prime  equipment  is  faulty] 

+  P[ reporting  no  failure | prime  equipment  is  good] 

-  P(F)0-P(FD)0*P(FIi)0*P(FD)I-P(FIi)I  +  [1-P(F)Q]-  [1-P(FA)0] 

2.8  Testability  Procedures  for  the  Organizational/ Intermediate/ Depot  System 

In  this  case  the  major  concern  is  with  the  testability  at  the  organizational/ 
Intermediate/depot  levels  as  if  they  are  one  system.  This  testability  system  is 
presented  in  Figure  2.5.  Some  of  the  states  of  this  system  are  exactly  the  same 
as  the  ones  in  section  2.6.  Among  them  are  failure  to  report  (organizational), 
failure  to  isolate  LRU,  failure  to  report  (intermediate),  failure  to  isolate 
module,  no  faulty  isolation,  no  module  isolation,  CNDq,  CNDj  and  RTOKj.  In  addi¬ 
tion  the  system  can  be  in  one  of  the  following  states: 

a.  Failure  to  report  (depot) 

The  prime  equipment  is  faulty,  the  faulty  LRU  is  isolated  at  the 
organizational  level,  and  the  faulty  module  is  isolated  at  the  inter¬ 
mediate  level.  However,  test  equipment  at  the  depot  reports  no  failure 
in  the  Isolated  faulty  module. 

b.  Failure  to  isolate  units 

The  prime  equipment  is  faulty,  the  faulty  LRU  and  faulty  modules  are 
correctly  Isolated  at  the  organizational  and  intermediate  levels.  How¬ 
ever,  in  the  depot,  module  failure  is  verified  but  no  unit  is  isolated. 
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Figure  2.5  Testability  Procedures  for  the  Composite  Organizational/ 
Intermediate/Depot  System 
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Successful  FD/F1  (organizational/intermediate/depot)  system 

Isolating  the  faulty  component/unit  at  the  depot  if  the  prime  equip¬ 
ment  is  faulty,  or  reporting  no  failure  at  the  organizational  level  if 
the  prime  equipment  is  fault  free. 

False  Dolt  Isolation 

The  prime  equipment  is  faulty,  the  faulty  LRU  and  faulty  modules  are 
Isolated  at  the  organizational  and  intermediate  levels.  However,  in  the 
depot  a  good  unit  is  isolated  instead  of  the  faulty  one. 

No  Unit  Report 

The  prime  equipment  is  faulty,  a  good  module  is  isolated  at  the 
intermediate  level  either  after  isolating  the  faulty  LRU  or  Isolating  a 
good  LRU  at  the  organizational  level.  Then,  tests  at  the  depot  report  no 
module  failure. 

No  Unit  Isolation 

The  prime  equipment  is  faulty,  a  good  module  is  Isolated  at  the 
intermediate  level  either  after  isolating  a  good  or  the  faulty  LRU  at  the 
organizational  level.  However,  tests  at  the  depot  report  module  failure 
but  fail  to  isolate  any  units. 

False  Report  (depot) 

A  good  module  is  introduced  to  the  depot  (as  a  result  of  either 
false  module  isolation,  or  false  report  in  the  intermediate  level)  where 
tests  report  module  failure  and  isolate  good  units. 

Can  Not  Duplicate  (depot)  CNDp 

The  prime  equipment  is  good,  a  good  module  is  introduced  to  the 
depot  as  a  result  of  module  false  report  at  the  intermediate  level.  How¬ 
ever,  tests  at  the  depot  report  module  failure,  then  isolate  no  units. 
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i.  Re-Teat  OK  (depot)  RTOH^ 

The  prime  equipment  is  good,  a  good  module  is  introduced  to  the 
depot  as  a  result  of  module  false  report  at  the  intermediate  level. 
However,  tests  at  the  depot  report  no  failure. 

2.9  Testability  Effectiveness  Measures  for  the  Organizational/Intermediate/Depot 
System 

a,  B,  and  Y  errors  for  the  organizational/intermediate/depot  system  are 
derived  using  the  decision  tree  in  Figure  2.5  as  follows: 

a  ■  P[false  report  and/or  isolation | prime  equipment  is  good] 

8  =  P[failure  to  detect  and/or  isolate  the  faulty  unit (prime  equipment  is 
faulty ] 

Y  *  P[correct  action  of  not  isolating  LRU,  module  and/or  unit  after  mis- 

takingly  detecting  and/or  isolating  a  good  LRU  or  module] 

6  «  P[successful  performance  of  the  diagnostic  system  as  a  whole] 
a  -  [  1 — P ( F ) 0 ]  P(FA)0-P(WIJL)0.p(FA)1.P(Wli)I-P(FA)D-P(WIi)D 
B  -  P(F)q  -  [P(F)0-P(FD)0*P(FIi)0*P(FD)i)I*P(FD)D*P(FIi)D] 

Y  -  [1-P(F)0]  P(FA)q-  [l-P(WIi)()]  +  [1-P(F)0J  P(FA)0*P(WIi)0- I1~P(FA)I]  + 

[ 1-P(F)q ]  P(FA)0*P(WIi)0*P(FA)I«  11— P(WIi)I]  +  [l-P(F)o]  P(FA)Q* P(WI± )Q 
•P(FA)I*P(WIi)].«  U-P(FA)d)  +  [1-P(F)0J  P(FA)0-P(WI1)0-P(FA)I«P(WIi)I 

•p(fa)d* ti-p(wii )D] 

-  [ 1— P ( F ) 0 ]  P(FA)q* [l-P(WIi)0  +  P(WIi)0  -  P(WIi)0«P(FA)I]  +  [1-P(F)0] 
•P(FA)0*P(WIi)0*P(FA)I*  {1-P(WI1)I  +  P(WIi)I*P(FA)D]  + 

[1-P(F)0]  P(FA)0-P(WIi)0«P(FA/I-P(WIi)I*P(FA)D*  [l-P(WIi)()] 

-  [1-P(F)0]  P(FA)q*  [l-P(WIi)0]  P(FA)t  +  [1-P(F)0]  P(FA)0*P(WIi)0*P(FA)I 
[l-PCWI^j  P(FA)d]  +  1 1  — P ( F ) 0 ]  P(FA)0-P(WIi)0-P(FA)1-P(WIi)I.p(FA)D 


[ i-p<wii)D] 


y  =  [1“P(F )Q ]  P(FA)Q  -  1 1-P(F >0 ]  P(FA)0*P(WIi)0*P(FA)I*P(WIi)I*P(FA)D*P(WIi)D 
-  U-P(F)01  P(FA)q*  [l-P(WIi)0*P(FA)I*P(WI1)I*P(FA)D*P(WIi)D] 

6  •=  [  1  ~P ( F ) 0 ]  [1-P(FA)Q]  +  P(F)0*P(FD)0*P(FI1)0*P(FD)I*P(FIi)I-P(FD)D-P(FI1)D 
Special  case  when  the  depot  repair  cycle  is  perfect  as  in  Figure  2.6,  then 
P(WI±)D  =  0 

P<rIi>D  '  ‘ 

a  =  0 

B  =  P(F)0  -  [P(F)0*P(FD)0*P(FIi)0*P(FD)I*P(FIi)I*P(FD)D] 
y  =  [1-P(F)q]  P(FA)q 

6  =  [i-p(f)0]  [i-p(fa)0j  +  p(f)0*p(fd)0«p(fi1)0-p(fd)i*p(fi1)i*p(fd)d 

2.10  Another  Definition  for  a,  3,  and  T  errors 

These  new  definitions  of  a,  6,  and  Y  errors  capitalize  on  the  importance  of 
false  removals  of  RU  at  any  level  in  affecting  the  maintainability  and  availabil¬ 
ity  of  the  system. 

Let 

a ^  =  P[unnecessary  removal  of  a  good  RU  from  the  diagnostic  group  at  level  £] 

* 

Let  (5^  be  the  probability  of  not  detecting  and/or  not  isolating  a  faulty  RU  at  any 
level  i  (excluding  cases  where  good  RU's  are  isolated) 

6^  *  P[detecting  or  isolating  no  fault (diagnostic  group  is  faulty] 

it 

Let  y  be  the  correct  action  of  not  isolating  a  good  RU  after  reporting  its 

failure  (either  at  the  same  level  or  at  a  lower  level) 

* 

Y ^  *  P[correct  action  of  not  isolating  a  good  RU  after  reporting  its 
failure (diagnostic  group  is  good] 

2.11  Testability  Effectiveness  Measures  at  Any  Level  of  Repair 

AAA 

According  to  the  above  definitions  of  a  ,6  ,  Y  errors  and  using  Figure  2.2, 
then  testability  effectiveness  measures  at  any  level  i  (Organizational, 
Intermediate,  or  Depot)  can  be  derived  as  follows: 
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a *  Pffalse  RU  report]  +  Pffalse  RU  isolation] 

-  [1-P(F)a ]  P(FA)£  P(WIi)£  +  P(F)4  p(fd)£  P(FL±)£ 

* 

3£  *  Failure  to  report  +  Failure  to  isolate  RU 

-  p(f)£  -  [p(f)£  p(fd)£]  +  [p(f)£  p(fd)£  -  P(F )£  P(FD)£  PCFL^  -  P(F)£ 

P(FD)£  P(FIi)£] 

-  P(F)£  -  P(F)£  P(FD)£  P(FLi)i  -  P(F)4  P(FD)£  PCFI^ 

-  p(f)£  -  P(F)£  p(fd)£  [P(FL±)£  +  P(FIi)£] 

Y£*  =  cnd£ 

=  [1-P(F)a]  P(FA)£  -  [1-P(F)£ ]  p(fa)£  P(WIi)£ 

=  [1-P(F)£]  P(FA)£  [l-P(WIi)£] 

6£  *  P[Successful  FD/FI] 

=  i-p(F)£  -  [1-P(F)£]  p(fa)£  +  p(f)£  p(fd)£  p(fi1)4 

-  [1-P(F)£ ]  [1-P(FA)£]  +  P(F)£  P(FD)£  p(fi4)£ 


2.12  Testability  Effectiveness  Measures  for  the  Organizational/ Intermediate 
System 


a  ,  B*,  y*  errors  for  the  organizational/intermediate  system  are  derived 


using  the  decision  tree  in  Figure  2.4  as  follows: 
* 


*  P[removing  a  good  RU  at  the  organizational/intermediate  system] 


=  P[false  report  (intermediate)]  +  P[module  false  isolation] 

-  [  1— P ( F ) 0 ]  P(WIi)0«P(FA)0*P(FA)I*P(WIi)I  +  [P(F)0*P(FD)0«P(FIi)0*P(FD)1 


•P(FL±)I  +  [P(F)0*P(FD)0*P(FL1)0*P(FA)I«P(WIi)I] 


*  P[detecting  or  isolating  no  faults (prime  equipment  is  faulty] 


P[failure  to  report  (org.)]  +  P(failure  to  report  (int.)]  +  P[failure  to 
isolate  LRU]  +  P[failure  to  Isolate  module]  +  P[no  fault  Isolation]  + 
P[no  module  Isolation] 


3 Oi  "  lp(^)0  ‘  P(F)0*PCFD)0]  +  IPCF^PCFD^PCFI^  -  P(F>0*P(FD)0 

•P(FI±)0«P(FD)1]  +  [P(F)0*P(FD)0  -  P(F)0*P(FD)0«P(FIi)0  -  P(F)Q 
•PCFDJ^PCF^Jq]  +  [P(F)0*P(FD)0*P(FIi)0»P(FD)1  -  P(F)Q.p(FD)0 
•P(FI1)o*P(FD)I*P(FI.)I  -  P(F)0-P(FD)0*P(FIi)0*P(FD)I-P(FL1)I] 

+  [P(F)0*P(FD)0-P(FLi)0  -  P(F)0*P(FD)0*P(FL1)0*P(FA)I]  +  [P(F)Q 
•P(FD)0‘P(FLi)0*P(FA)I  -  P(F)0-P(FD)0«P(FL1)0*P(FA)1*P(WIi)1] 

-  P(F)q  -  P(F)0«P(FD)0*P(FI1)0*P(FD)I*P(FLi)I  -  P(F)Q- P(FD)Q*  P(FI± )Q 
•P(FD)I«P(FIi)I  -  P(F)0*P(FD)0*P(FLi)0*P(FA)I*P(WIi)I 

Yqj  “  P[correct  action  of  not  isolating  LRU  and/or  module  after  isolating 
isolating  a  good  LRU] 

=  CND0  +  CNDj  +  RTOKj 

=  I[1-P(F)0]  P(FA)q  -  [1-P(F)0]  P(FA)0*P(WIi)0)  +  lll-P(F)0]  PCWI^ 
•P(FA)0-P(FA)I  -  [1-P(F)0]  P(WI.)0‘P(FA)0*P(FA)1-P(WI1)I]  +  l[l-P(F)0l 
P(Wli)0-P(FA)0  -  [1-P(F)0]  P(WIi)0‘P(FA)0*P(FA)I] 

-  [1-P(F)q]  P(FA)q  -  [1-P(F)q]  P(FA)0*P(WIi)0*P(FA)I*P(WIi)1 

-  U-P(F)0]  P(FA)q  [l-P(WIi)0-P(FA)I-P(WIi)I] 

=■  Plsuccessful  performance  of  the  diagnostic  system  in  the  organizational 
and  intermediate  levels  as  a  whole] 

=  P[isolating  the  faulty  module [prime  equipment  is  faulty] 

+  P[ reporting  no  failure j prime  equipment  is  good] 

*  P(F)0*P(FD)0*P(FIi)0*P(FD)I'P(FI1)1  +  [1-P(F)Q]  [1-P(FA)0] 


2.13  Testability  Effectiveness  Measures  for  the  Organizational/Intermediate/Depot 
System 

A 

a  ,  6  ,  y  errors  for  the  organizational/intermediate/depot  system  are 

derived  using  the  decision  tree  in  Figure  2.5  as  follows: 

* 
a 


Ptremoving  a  good  RU  at  the  organizational/intermediate/depot] 
P[ removing  a  good  unit  at  the  depot  level] 


=  1 1 — P ( F ) 0 ]  P(FA)0-P(WI1)0*P(FA)I*P(WI1)I.p(FA)D*P(WIi)D  +  P(F)Q*P(FD)0 

•  PCFIjJq*  P(FD)i»  PCF^)^  P(FD)d*  P(FLa  )d  +  P(F)Q*  P(FD)0*  P(FIi)0*  P(FD)X 

•  P(FLi>I*  P(FA)d*  P(WIi  )D  +  P(F)0*  P(FD)q.  P(FL±  )q*  P(FA)x*  ?m±  P(FA>D 

*P<WIi>D 

A  I 

3  *  P[detecting  or  isolating  no  faults [prime  equipment  is  faulty] 

-  P(F)0  -  [P(F)0*P(FD)0*P(FIi)0*P(FD)I*P(FIi)I-P(FD)D*P(FIi)D]  -  P(F)Q 

•  P(FD)q*  PCFI^.  P(FD)x*  P(FI±  )x-  P(FD)d.  P(FLx  )d  -  P(F)q*  P(FD)Q*  PCFI^ 

•P(FD)I*P(FLi)1*P(FA)D*P(WIi)D  -  P(F)0.p(FD)0*P(FLi)0*P(FA)I«P(Wli)I 

•P(FA)D*P(WIi)D 

A 

Y  *  P[correct  action  of  not  isolating  LRU,  module  and/or  unit  after 

mistakingly  detecting  and/or  isolating  a  good  LRU  or  module] 

A 

Y  =  Y  as  in  the  previous  case  in  section  2.9 

=  H-P(F)0]  P(FA)q  [l-P(WIi)0*P(FA)I*P(WIi)I*P(FA)D*P(WI1)D] 

A 

6  =  P[successful  performance  of  the  diagnostic  system  as  a  whole] 

-  [1-P(F)Q]  [1-P(FA)0]  +  P(F)0*P(FD)0*P(FI1)0«P(FD)I-P(FIi)I.PCFD)D-P(FIi)D 
Special  case  when  the  depot  repair  cycle  is  perfect  as  in  Figure  2.6. 

PCWl^  -  0 
P(FIi)D  -  1 
P(FL1)d  =  0 


P(F)0  -  [P(F)0*P(FD)0-P(FI1)0*P(FD)1«P(FIi)I*P(FD)D] 

[1-PF)q]  P(FA)0 

[l-P(F)n]  l  l-P(FA) _  ]  +  P(F)  •  PCFD)-.*  P(FI .  )  •  P(FD)*  P(FI.  )•  P(FD)_. 


3.  Maintainability,  Availability  and  Reliability 


3.1  System  Maintainability 

System  maintainability  is  defined  as  a  measure  of  the  capability  of  the  diag¬ 
nostic  system  to  detect,  isolate  and  repair  the  equipment  and  return  it  to  its 
operational  status. 

Maintainability  can  also  be  defined  as  a  characteristic  of  design  and  instal¬ 
lation  which  is  expressed  as  the  probability  that  an  item  will  be  retained  in,  or 
restored  to,  specified  conditions  within  a  given  period  of  time,  when  maintenance 
is  performed  in  accordance  with  prescribed  procedures  and  resources  [MIL-STD- 
7 2 IB ] .  Maintainability  can  be  controlled  and  improved  by  increasing  the  effec¬ 
tiveness  of  the  test  equipments. 

The  primary  objective  of  the  maintainability  analysis  is  to  translate  the  so- 
called  requirements  into  usable  maintainability  parameters  such  as  mean  mainte¬ 
nance  down  time,  allowable  maximum  maintenance  time,  mean  preventive  maintenance 
time,  maintenance  manhours  per  flight  hour,  turnaround  time  required  for  returning 
the  equipment  to  an  operationally  ready  condition,  percentage  of  equipment  which 
can  be  down  for  maintenance  and  still  permit  the  attainment  of  the  operational 
requirement,  Mean-Time-to-Restore  (MTR) ,  the  Mean-Time-to-Repair  (MTTR)  and  mean- 
time-between-maintenance.  The  most  important  of  the  above  parameters  are  MTR  and 
MTTR;  therefore,  they  will  be  discussed  in  more  detail. 

3.1.1  Mean-Time -To-Res  tore  (MTR) 

MTR  is  the  mean  time  interval  between  shutdown  for  maintenance  and  restora¬ 
tion  of  the  system  to  operating  status.  This  does  not  include  supply  time  and 
administrative  time.  MTR  is  best  used  where  active  single  or  multiple  parallel 
redundancy  exists  within  the  system  or  subsystem. 


The  MTR  may  not  include  any  repair  time  where  the  function  can  be  restored  by 
other  means.  Indeed,  with  the  advent  of  microcomputers  and  advanced  electronics, 
restoration  may  be  immediate  and  automatic  since  extensive  redundancy  can  be 
packed  into  electronic  equipments. 

The  MTR  can't  be  used  as  a  sole  maintainability  requirement,  since  the  main¬ 
tainability  of  the  failed  item  is  not  completely  considered. 


3.1.2  Mean-Time-To-Repair  (MTTR) 

MTTR  is  defined  in  MIL-STD-721B  as  the  total  corrective  maintenance  time 
divided  by  the  total  number  of  corrective  maintenance  actions  during  a  given 
period  of  time.  Further,  the  repair  time  will  consist  of  those  actions  required 
to  perform  on-line  repair  of  a  failed  item  of  equipment.  The  repair  time  includes 
the  time  to  isolate  the  fault  to  the  LRU  level,  the  time  required  to  remove  and 
replace  the  item,  and  the  time  required  to  verify  that  the  fault  has  been  cor¬ 
rected.  Supply  time  and  administrative  time  are  not  included. 

The  MTTR  can  also  be  defined  as  the  elapsed  time  from  start  of  work  on  the 
correction  of  a  malfunction  indication  to  the  completion  of  the  maintenance  action 
and  verification  of  the  correction. 

The  Mean-Time-to-Repair,  if  correctly  defined,  can  provide  significant  in¬ 
sight  into  true  diagnostic  system  impact  on  overall  system  maintainability  and 
availability.  It  can  also  be  considered  as  a  measure  of  the  adequacy  of  the 
system  in  meeting  real  operational  requirements. 

MTTR  may  be  further  broken  down  into  four  components: 

1)  Set-up  time 

2)  Troubleshooting  time 

3)  Remove  and  replace/repair  time 

4)  Checkout  time. 


Only  the  second  and  fourth  components  relate  to  FD/FI  capability,  while  the  first 
and  third  relate  to  the  design  of  support  equipment  and  overall  system  maintain¬ 
ability. 


It  is  important  to  mention  that  the  MTTR  can  be  used  interchangeably  with  the 
mean-corrective-maintenance-time  (Mct).  Most  maintainability  parameters  and 
criteria  (including  the  MTTR)  are  aimed  at  "primary  maintenance".  That  is,  main¬ 
tenance  required  to  restore  a  system  or  equipment  to  a  "specified  condition  within 
a  given  period  of  time  at  one  level”.  However,  little  or  no  maintainability 
attention  is  paid  to  the  problem  of  "secondary  maintenance",  that  is,  the  problem 
of  subsequent  repair  below  the  LRU  level  (module  and  parts). 

In  addition,  the  way  MTTR  is  usually  defined  does  not  differentiate  between 
the  time  consumed  by  the  diagnostic  system  to  correctly  isolate  and  repair  the 
faulty  unit  and  the  time  which  is  wasted  to  isolated  and  repair  a  good  unit,  or 
the  time  wasted  in  repairing  a  unit  despite  returning  it  as  a  bad  unit. 

Therefore,  it  is  suggested  that  the  MTTR  be  broken  into  two  major  components 
in  order  to  shed  light  on  the  actual  maintainability  of  the  system  at  each 
level.  These  components  are:  the  mean-time-for-actual-repair  and  the  mean-time- 
f or-unnecessary-repair . 

3.1.3  Maintainability  at  Different  Levels  of  Repair 

In  order  to  evaluate  system  maintainability  at  any  level  of  repair,  at  least 
two  figures  should  be  considered.  The  first  should  show,  with  certain  probabil¬ 
ity,  if  the  fault  is  correctly  detected,  how  much  time  it  will  take  the  diagnostic 
system  to  correctly  isolate  the  fault  and  replace  the  faulty  replaceable  unit. 
This  figure  indicates  how  fast  the  diagnostic  system  is,  in  helping  the  diagnostic 
group  to  recover  from  a  failure  and  become  functional  again.  This  figure  can  be 


represented  by  the  Mean-Time-to-Replace  at  each  level.  The  second  figure  should 
show  how  long,  on  the  average,  it  will  take  to  repair  the  isolated  replaceable 
unit  and  return  it  as  a  spare  part  to  the  same  level  it  was  isolated  at.  This 
figure  can  be  decomposed  into  two  different  parameters  to  cover  the  two  cases 
concerned:  returning  the  replaceable  unit  as  a  good  spare  part  or  as  a  bad 
(faulty)  spare  part.  This  figure  can  be  represented  by  the  Mean-Iime-to  Repair. 

From  the  above  discussion,  measures  of  system  maintainability  at  different 
levels  can  be  redefined  to  represent  the  actual  real  life  situation.  Furthermore, 
the  new  measures  will  guarantee  covering  both  "primary"  and  "secondary"  mainte¬ 
nance. 

3.1.4  System  Maintainability  at  the  Organizational  Level 

The  main  concern  at  this  level  is  the  capability  of  the  prime  equipment  to  be 
returned  to  operational  status  in  a  specified  period  of  time  (mission  time). 

The  maintainability  measures  at  the  organizational  level  can  be  defined  as 
follows: 

a)  Mean-Time -To-Replace  at  the  Organizational  Level  (HTRq) 

If  the  prime  equipment  is  faulty,  then  we  can  assert  with  probabil¬ 
ity  Pq  that  it  will  take  the  BIT/ATE  time  MTRq  to  correctly  detect,  iso¬ 
late,  replace  the  faulty  LRU,  and/or  switch  to  a  redundant  LRU  in  order 
to  return  the  equipment  to  its  normal  functional  status,  where  Pq  - 
P(FD)q*P(FI1)q. 

b)  Mean-Tlme-For-Actual-Repalr  at  the  Organizational  Level  (MTARq) 

MTARq  is  the  elapsed  time  from  start  of  work  on  the  correction  of  a 
faulty  LRU,  after  correct  detection,  isolation,  and  removal  from 
equipment,  until  correct ly  repairing  it  (replacing  the  faulty  module  with 


a  good  one  at  the  intermediate  level),  and  returning  this  LRU  as  a  good 
spare  part  to  the  organizational  level. 


Let  Mq(0  be  the  probability  that  a  faulty  LRU  which  was  correctly 
isolated  at  the  organizational  level  can  be  repaired  in  time  t  (at  the 
intermediate  level).  When  time  to  repair  has  the  exponential  distribu¬ 
tion,  the  probability  of  repair  in  time  t  can  be  expressed  as 

-Pit 

M0(t)  -  1  -  e 
where , 


MTR 


j  *  mean  repair  time  of  LRU's  at  the  intermediate  level. 


Pj  *  repair  rate  of  LRU's  at  the  intermediate  levels  *  1/MTR^ 
MQ(t)  =  1  -  e_t/MTRI 


e-t/MTR1  _  j  _  M  (t) 


-t/MTRj  =  in  (1  -  MQ(t)) 
-t 


MTR 


I  in  (1  -  MQ(t)) 


If  StQI  is  the  shipping  time  of  an  LRU  between  the  organizational  and 
intermediate  level 


.••MTARq  *■  MTRj  +  2St0I 

)  Mean-Time-For-Unnecessary -Repair  of  LRU  at  the  Organizational  Level 
(HTURq) 

MTURq  is  the  elapsed  time  from  start  of  work  on  the  correction  of  an 
LRU  (it  can  be  faulty  or  not)  after  it  has  been  isolated  until  returning 
this  LRU  as  a  spare  part  to  the  organizational  level  (either  a  good  LRU 
is  not  correctly  repaired,  and  returned  as  a  bad  spare  part). 


d)  Mean-Time -to-Repair  LRU  (MTTRq) 

MTTRq  is  the  expected  elapsed  time  from  start  of  work  on  the  correc¬ 
tion  of  LRU  failure  indication  until  repairing  this  LRU  and  returning  it 
as  a  spare  part  at  the  organizational  level.  MTTRq  can  be  computed  using 
its  components  MTARq  and  MTURq  as  follows: 

MTTRq  -  MTARq*  [P(F)Q*P(FD)0*P(FIi)().p(FD)I*P(FIi)I]  + 

MTURq*  (P(F)q.P(FD)q*P(FL1)0  +  P(F)Q*  P(FD)Q*  PCF^  )Q* 

(1-P(FD)I)  +  P(F)0*P(FD)0*P(FIi)Q*P(FD)I.(l-P(FI1)I). 

+  (1-P(F)q)*P(WI1)q] 

3.1.5  System  Maintainability  at  the  Intermediate  Level 

The  main  concern  at  this  level  is  the  capability  of  an  LRU  to  be  returned  to 
a  serviceable  status  by  the  specified  test  and  repair  equipment  within  a  specified 
period  of  time. 

The  maintainability  measures  at  the  intermediate  level  can  be  defined  as 
follows : 

a)  Mean-Time-to-Replace  at  the  Intermediate  Level  (MTRj) 

If  the  LRU  is  really  faulty,  then  we  can  assert  with  probability  Pj, 
that  it  will  take  the  external  test  equipment  time  MTRX  to  correctly 
verify  the  failure,  isolate  and  replace  the  faulty  module  in  order  to 
return  the  LRU  to  its  normal  functional  status,  where 
Px  *  P(FD)I*P(FIi)I 

MTRj  -  fault  detection  time  +  time  to  isolate  faulty  module  +  time 
to  replace  faulty  module. 


b)  Mean-Time -for-Actual-Reapir  of  Modules  (MTAR^) 

MTARj  is  defined  as  the  elapsed  time  from  start  of  work  on  the  cor¬ 
rection  of  a  faulty  module  (after  correct  detection  and  isolation)  until 
correctly  repairing  it  by  replacing  the  faulty  component  with  a  good  one 
either  at  the  intermediate  or  at  the  depot  levels  and  returning  this 
module  as  a  good  spare  part  to  the  intermediate  level.  (Notice  that  if  a 
faulty  module  is  to  be  discarded  and  not  repaired,  then  MTARj  =  0.)  Let 
Mj(t)  be  the  probability  that  a  faulty  module  which  was  correctly  iso¬ 
lated  at  the  intermediate  level  can  be  repaired  in  time  t  (at  the 
depot).  When  the  time  to  repair  has  the  exponential  distribution,  the 
probability  of  repair  in  time  t  can  be  expressed  as 
Mj(t)  **  1  -  e^D* 

MTRd  -  mean-time -to-replace  modules  at  the  depot 

M  D  *  repair  rate  of  modules  at  the  depot 

PD  -  1/MTRd 

Mj(t)  -  1  -  e"t/MTRD 
e-t/MTRD  _  l  _  Mi(t) 

-t/MTRD  -  An  (1  -  Mx(t)) 

MTRD  "  In  (1  -  M];(t)) 

If  ^ID  I®  s^ippin8  time  of  a  module  between  the  intermediate  level 

and  the  depot,  then 

MTARj  =  MTRd  +  2StID 

c)  Mean-Time -for-Unneccessary -Repair  of  Modules  (MTURj) 

The  elapsed  time  from  start  of  work  on  the  correction  of  any  module 
(it  may  be  faulty  or  not)  after  it  has  been  isolated,  until  returning 
this  module  as  a  spare  part  to  the  intermediate  level  (either  a  good 


module  is  incorrectly  isolated  and  unnecessarily  checked  or  a  faulty 
module  is  not  correctly  repaired  then  returned  as  a  bad  spare  part  to  the 
intermediate  level).  This  covers  the  time  consumed  in  the  following 
cases: 

•All  cases  resulting  from  false  module  isolation. 

•All  cases  resulting  from  successfully  detecting  and  isolating  the 
faulty  module  except  the  case  of  successful  detection  and  isolation  of 
the  faulty  component. 

•All  cases  resulted  from  isolating  a  good  module, 
d)  Mean-Tiae-to-Repair  Modules  (MTTRj- ) 

The  Mean-Time-to-Repair  Module  is  the  elapsed  time  from  start  of 
work  on  the  correction  of  module  failure  indication  until  repairing  this 
module  and  returning  it  as  a  spare  part  at  the  intermediate  level.  It  is 
a  function  of  MTARj  and  MTURj ,  MTTRj  and  can  be  expressed  as  follows: 

MTTRj  -  MTARj.*  [P(F)I«P(FD)I*P(FIi)].-P(FD)D*P(FIi)D]  +  MTUR^ 

[P(F)I*P(FD)I«P(FLi)I  +  P(F)I*P(FD)1»P(FI1)I»(1  -  P(FD)D> 

+  P(F)I*P(FD)I*P(FIi)I.p(FD)D*(l  -  P(FIi)D  +  (1  -  PCF)^. 
PCWI^] 

3.1.6  System  Maintainability  at  the  Depot 

The  maintainability  at  the  depot  can  be  measured  by  the  capability  of  the 
modules  to  be  repaired  and  returned  to  a  serviceable  condition  at  a  specified 
percentage  of  unit  cost.  This  can  be  described  by  the  mean-time-to-replace  as 
well  as  percent-cost-to-repair. 


a)  Mean-Time -to-Re pi ace  at  the  Depot  (MTRjj) 

If  the  module  is  really  faulty,  then  we  can  assert  with  probability 
Pjj,  that  it  will  take  the  test  equipment  at  the  depot  time  MTR^  to  cor¬ 
rectly  verify  the  failure,  isolate  and  replace  the  faulty  component  in 
order  to  return  the  module  to  its  normal  operational  condition. 

PD  -  P(FD)D‘P(FIi)D 

MTRd  *  fault  detection  time  +  time  to  isolate  components  +  time  to 
replace  faulty  component. 

b)  Percent-Cost-to-Repair  at  the  Depot  (PCTRjj) 

If  the  module  is  really  faulty,  then  we  can  assert  with  probability 
Pp  that  the  cost  of  correctly  verifying  the  failure,  isolating,  and  re¬ 
placing  the  faulty  module  as  a  percentage  of  the  initial  cost  of  the 
module  is  PCTRjj. 

3.2  Reliability 

Reliability  is  the  probability  that  a  system  or  equipment  will  give  satisfac¬ 
tory  performance  for  a  specified  period  of  time  when  used  under  stated  condi¬ 
tions.  When  related  to  a  specific  mission,  reliability  may  be  defined  as  the 
probability  of  a  successful  mission  of  given  duration  under  specified  use  condi¬ 
tions. 

The  literature  on  reliability  contains  other  parameters  such  as  Mean-Time- 
Between-Failure  (MTBF),  Mean-Time-To-Failure  (MTTF),  and  Mean-Time-To-First- 
Failure  (MTTFF).  These  three  terms  can  be  used  interchangeably  because  of  the 
applicability  of  the  exponential  law  to  the  majority  of  electronic  equipments. 
Under  the  exponential  law  these  three  terms  are  identical.  However,  if  the 
failure  distribution  is  not  exponential,  these  terms  do  not  describe  the  same 


thing.  MTBF  is  specifically  applicable  to  a  population  of  equipment  where  we  are 
concerned  with  the  average  time  between  the  individual  equipment  failures.  Where 
we  are  concerned  with  one  equipment  or  one  system,  the  measures  MTTF  and  ..TTFF  are 
applicable. 

The  difference  between  MTTF  and  MTTFF  is  the  specification  of  the  initial 
operating  conditions  and  how  time  is  counted.  MTTF  is  a  measure  of  the  expected 
time  the  system  is  in  an  operable  state  before  all  the  equipments  reach  a  failed 
state.  In  arriving  at  this  measure  we  count  time  from  when  the  system  was 
initially  fully  operable  until  all  equipments  reach  a  failed  state  without  repairs 
made  until  the  system  is  in  a  failed  state.  MTTFF  is  a  measure  of  the  expected 
time  the  system  is  in  an  operable  state  allowing  Individual  equipments  to  be 
repaired  as  they  fail  given  that  all  equipments  were  initially  operable  when  we 
began  counting  time. 

3.2.1  Reliability  of  the  Prime  Equipment 

Beside  its  main  function,  the  prime  equipment,  using  the  BIT/ATE  system,  has 
the  ability  to  detect /isolate  any  malfunction  in  any  LRU  if  a  malfunction  exists, 
or  report  no  malfunction  if  none  exists.  Therefore,  the  reliability  of  the  prime 
equipment  should  be  affected  by  the  performance  of  the  BIT/ATE  system. 

Accordingly,  reliability  of  the  prime  equipment  R  can  be  defined  as  the 
probability  that  either  the  equipment  is  good  and  BIT/ATE  system  reports  no 
failure  or  the  equipment  is  faulty  and  the  BIT/ATE  successfully  detects.  Isolates 
the  faulty  LRU  and  replaces  it,  providing  a  down  time  not  exceeding  a  given  time 
tc  which  will  not  adversely  affect  the  overall  mission.  Reliability  in  time  t  can 


R(t)  *  P[ BIT/ ATE  reports  no  failure | equipment  is  good]  +  P[BIT/ATE  correct¬ 
ly  detects  and  ioslates  the  faulty  LRU  and  replaces  it  in  time 
tc | equipment  is  faulty] 

R(t)  -  [1  -  P(F)q]»[1  -  P(FA)0J  +  P(F)0«P(FD)0*P(FI1)0*PR(tc) 

(3.1) 

where  P(F)q  =  probability  of  equipment  failure 

N  4  ,t 
*  1  -  n  [e  °  ] 
i=l 

P(FA)q  *  probability  of  false  alarm  at  the  organizational  level 

P(FD)q  =  probability  of  correct  detection] equipment  is  faulty 
P(FIi>o  =  probability  of  correct  isolation  to  i  or  less  LRU's [equipment  is 
faulty 

PR(tc)  =  prbability  that  a  faulty  LRUj  which  is  correctly  isolated  can  be 
replaced  in  time  tc,  where  tc  is  the  critical  time  for  replacing  the 
faulty  LRU,  exceeding  which,  the  mission  fails. 

Substituting  in  equation  3.1,  then 

N  -X  t  N  -A  t 

R(t)  -  [  n  e  01  ]•  [1  -  P(FA)Q ]  +  [1  -  n  e  01  ]•  P(FD)0«  PCFI^q*  PR(tc> 
i=l  i°l  w 

It  is  important  to  know  the  distribution  of  the  replacing  time  of  different 

LRU's  in  order  to  find  the  value  of  PR(tc).  In  any  event,  if  t£  >  tm,  where  tffl  is 


the  mission  time,  or  if  replacing  the  faulty  LRU  will  be  done  automatically  after 
correctly  identifying  it,  through  redundancy  for  example,  then  PR(tc)  -  1. 


3.2.2  Mean-Time-Between-Failures  (MTBF) 


Mean  life  or  mean-time-between-f ailurs  (MTBF)  is  the  total  operating  time  of 
the  prime  equipment  divided  by  the  total  number  of  failures,  where 

total  number  of  failures  viumber  of  actual  failures  (detected)  +  number  of 

false  alarms. 


equipment  failure  rate 


number  of  actual  failures 


number  of  false  alarms  ■  P(FA)  • 


operating  time 
NF 


0  P(F) _  * (1  "  P(FV 


NF  =  number  of  actual  failures 


MTBF  = 


operating  time 


number  of  actual  failures  +  number  of  false  alarms 


1 


MTBF 


A  + 


P(FA)0-A(1  -  P(F)q) 

P(F)n 


MTBF  = 


A [P( F ) Q  +  P(FA) 0» ( 1  -  P(F)0)J 


Notice  that  in  case  of  ignoring  the  false  alarm  (as  in  most  definitions  of 
(MTBF),  MTBF  - 


3.3  System  Availability 

System  availability  is  the  probability  that  a  system  or  equipments  when  used 
under  stated  conditions  in  an  ideal  support  environment  (l.e.,  available  tools, 
spares,  manpower,  etc.),  will  operate  satisfactorily  at  any  point  in  time.  It 
excludes  preventive  maintenance  actions,  logistics  supply  time,  and  administrative 
downtime. 

Availability  is  a  complex  function  of  the  LRU's  failure  rate,  operating  time 
(reliability),  and  down  time  (maintainability /supportability ) .  In  general,  avail¬ 
ability  can  be  expressed  as 


MTBF  +  MTTR 


4.  Cost  of  Testability  Performance 


In  this  section,  the  costs  involved  in  testability  performance  at  all  levels 
of  repair  are  analyzed,  then  the  expected  costs  associated  with  the  errors  of  the 
diagnostic  system  are  developed,  modeled,  and  used  to  shed  the  light  on  the  actual 
system  effectiveness.  These  costs  are  used  to  compute  a  very  realistic  life  cycle 
cost  for  the  equipment/system  that  takes  care  of  the  actual  performance  of  the 
equipment  and  the  resulting  consequences  from  the  performance  of  the  diagnostic 
system  (actual  failure,  false  alarm,  CND,  RTOK,  false  isolation,  etc.) 


4.1  Cost  Elements  at  Different  Levels 

In  this  section  all  cost  elements  at  all  different  levels  are  presented.  In 
addition,  the  following  parameters  are  also  considered. 

a)  Spare  Parts  Availability  (Asp) 

Spare  parts  availability  is  a  measure  of  how  available  the  spare 
parts  are  when  needed.  It  is  the  probability  that  there  is  a  spare  part 
available  when  one  is  needed  at  the  organizational  level. 

b)  Mission  Survival  Factor  (V) 

Mission  survival  factor  is  a  measure  of  how  vital  the  failure  of  any 
LRU  is  to  the  success  of  the  mission  and  can  be  represented  as: 

V  *  Prob.  [the  mission  can  be  accomplished  with  any  faulty  LRU] 


V  ■  Z  P[mission  can  be  accomplished | LRU^  is  faulty] • P[LRU^  is  faulty] 


c)  Mission  Abortion  Factor  P(MA) 

Mission  abortion  factor  is  a  measure  of  the  possibility  of  aborting 
the  mission  as  a  result  of  the  confusion  caused  by  CNDq  and  can  be  repre¬ 
sented  as: 

P(MA)  =  Prob.  [mission  abortion | CNDq] 

d)  Discard  Factor  Dj) 

Discard  factor  is  the  probability  that  any  module  will  be  discarded 
at  the  intermediate  level  instead  of  being  introduced  to  the  depot. 


4.1.1  Cost 8  of  BIT/ATE  Implementation  (CIq) 

The  costs  associated  with  the  BIT/ATE  implementation  include: 

1.  Acquisition  Costs  (CAq) 

a.  the  cost  of  BIT/ATE  hardware 

b.  the  cost  of  BIT/ATE  software 

2.  Initial  Logistics  Support  Costs  (CLq) 

a.  the  initial  cost  of  training  personnel  to  maintain  the  BIT/ATE  system 

b,  any  one  time  cost  associated  with  the  introduction  of  the  BIT/ATE 
system  into  the  maintenance  concept 

Acquisition  and  initial  logistics  support  costs  are  one  time  costs  associated 
with  the  implementation  of  the  BIT/ATE  system.  Hence,  CIq  is  a  one  time  implemen¬ 
tation  cost  for  the  BIT/ATE  system,  where 

CI0  ■  <*0  +  ^0 


4.1.2  BIT/ATE  Execution/ Isolation  Cost  (CLI) 

a.  software  maintenance  -  the  expenditure  resulting  from  inherent  software 
error  corrections  and  future  change  requirements. 


b.  technical  data  maintenance  -  the  coat  of  updating  and  revising  technical 
publications. 

c.  attrition  training  -  the  cost  of  training  new  maintenance  personnel. 

d.  maintenance  labor  -  the  cost  of  labor  to  maintain  the  BIT/ATE  system. 

e.  maintenance  material  -  the  cost  of  material  to  repair  the  BIT/ATE  system 
when  it  malfunctions. 

A  study  by  Bogard  et  al.  (1980)  investigates  Operations  and  Support  costs, 
and  finds  that  the  maintenance  material  and  labor  costs  for  a  BIT/ATE  system  are 
negligible.  Furthermore,  software  maintenance  costs,  which  usually  account  for 
the  majority  of  the  BIT/ATE  system  Operations  and  Support  costs,  are  strongly 

correlated  with  the  number  of  hours  that  the  BIT/ATE  system  is  in  operation.  So 

* 

the  Operations  and  Support  cost  for  the  BIT/ATE  system  is  dependent  on  the  fre¬ 
quency  with  which  the  system  is  executed.  This  cost  is  incurred  each  time  that 
the  BIT/ATE  isolation  process  is  executed,  and  is  called  the  LRU  isolation  cost. 

LRU  isolation  cost  is  the  average  cost  of  isolating  an  LRU  by  BIT/ATE 
assuming  that  the  BIT/ATE  system  indicates  a  failure.  This  cost  is  a  function  of 
the  isolation  procedure  and  cost  of  different  tests  which  can  be  used  by  BIT/ATE. 

N 

CLI  -  Z  CLI  .P(f  ) 
i»l 

where 

CLI^  ■  cost  of  isolating  LRUA 
P(fi>0  *  probability  of  LR^  failure. 

4.1.3  LRU  Removal  and  Replacement  Costs  (CLR) 

Removing  an  LRU  from  a  digital  system  at  the  organizational  level,  and 
replacing  it  with  a  spare,  involves  disconnecting  the  LRU,  removing  it,  inserting 


a  spare  in  Che  system,  and  connecting  the  spare.  Some  fixed  costs  may  be  included 
in  the  removal  cost,  while  the  time  to  remove  the  LRU  is  dependent  on  the  number 
of  modules  it  contains. 

The  time  to  disconnect  and  reconnect  the  LRU  is  a  function  of  the  number  of 
connections  that  must  be  severed.  Any  connection  between  a  module  in  LRUj  and  a 
module  not  within  LRU^  must  be  disconnected  in  order  to  remove  the  LRU.  These  are 
the  external  connections  of  the  LRU,  and  the  number  of  external  connections  is 
denoted  . 

The  cost  of  the  time  it  takes  to  remove  and  replace  LRU ^ ,  CLRj ,  generally 
depends  on  the  labor  rate  and  the  crew  size. 

Caponecchi  (1971)  develops  an  empirical  relationship  for  the  time  to  remove 
and  replace  an  LRU  from  the  system.  It  can  be  modified  in  this  problem  to  express 
the  cost  of  removing  and  replacing  LRUj ,  which  is: 

e4  Ej 

CLRj  *‘e1+e2Mj  +  e3e  J  +  CL^ 

where  e  ^  is  a  fixed  cost,  is  a  cost  associated  with  each  module  of  the  LRU, 
is  a  cost  modifying  the  exponential  relationship  of  the  number  of  connections, 
is  a  constant  modifying  the  number  of  connections,  and  CLj  is  cost  of  LRUj . 

The  expected  cost  to  remove  an  LRU  from  the  equipment/system,  CLR,  is  com¬ 
puted  as 


N 

CLR  -  l  CLR  «P(f  ) 

l  ^  1  U 


where  PCfj^Q  is  the  probability  that  LRU^  is  the  faulty  LRU 


4.1.4  Shipping  Cost  from  the  Organizational  Level  to  the  Intermediate  Level  Per 
LRD  (CSQI) 

CSQI  -  (WL)«(CPP0I) 

where 

WL  *  average  weight  of  LRU  or  group  of  LRU's 

N 

WL  -  l  (WL  )  •  n  /N 
i»l  1  U 

Hq  =  number  of  LRU's  in  the  isolated  group  of  LRU's 

CPPQI  “  cost  per  pound  of  transportation  and  packaging  between  the  organiza¬ 
tional  and  intermediate  levels 
WLa  =  weight  of  LR^ 

4.1.5  Cost  of  Mission  Abortion  (CMA) 

Cost  of  mission  abortion  is  all  the  costs  resulting  from  aborting  the  mission 
and  not  accomplishing  the  mission  goals  with  all  the  resulting  consequences. 

4.1.6  Cost  of  Mission  Failure  (CMF) 

Cost  of  mission  failure  includes  costs  of  all  equipments  and  personnel 
involved  in  the  mission  plus  cost  of  the  pride  and  the  national  Impacts  from  the 
mission  failure. 

4.1.7  Average  LRU  Cost  (CL) 

Average  LRU  cost,  which  is  included  in  the  cost  of  removal  and  replacement 
(CLR) ,  can  be  computed  as  follows: 


CL  »  (  I  CL  )/N 
i-1  1 

where  CL^  -  cost  of  LRUj^ 

4.1.8  Expected  Cost  Resulting  from  Having  a  Faulty  Spare  Part  at  the  Organiza¬ 
tional  Level  (CFq) 

The  expected  cost  resulting  from  having  a  faulty  LRU  coming  from  the  inter¬ 
mediate  level  as  a  spare  part  at  the  organizational  level  can  be  computed  using 
the  decision  tree  of  Figure  2.2,  including  all  costs  resulting  from  replacing  any 
LRU  (faulty  or  good)  by  a  faulty  LRU  with  costs  of  all  consequences. 

CFq  -  [1  -  P(F)0]-P(FA)0-P(WIi)0*(l  -  V)  (Asp)  [CLI  +  CLR  +  CMF] 

+  [1  -  P(F)0]*P(FA)0*P(Wli)0*V*(Asp)  [CLI  +  CLR]  +  P(F)Q«P(FD)0 
•P(FI1)0*(1  -  V)  (Asp)  [CLI  +  CLR  +  CMF]  +  P(F)Q- P(FD)Q* P(FIA )Q 
•V  (Asp)  [CLI  +  CLR] 

4.1.9  Costs  of  External  Tests  Inplementation  (CIj) 

At  the  intermediate  level,  costs  associated  with  external  test  equipment 
include: 

1.  Acquisition  Costs  -  the  cost  of  procuring  the  external  test  equipment. 

2.  Initial  Logistics  Support  Costs 

a.  the  initial  cost  of  training  operators  and  maintenance  personnel. 

b.  any  one  time  cost  associated  with  introducing  the  external  test 
equipment  into  the  maintenance  cycle. 

The  costs  of  external  tests  Implementation  is  a  one  time  cost  incurred  every 
time  an  LRU  (faulty  or  good)  is  introduced  to  the  intermediate  level. 


4.1.10  Operations  and  Support  Costs  of  External  Tests 

a.  software  maintenance  -  if  the  equipment  Is  semiautomatic  and  software 
based. 

b.  technical  data  maintenance  -  the  cost  of  updating  and  revising  technical 
publications  such  as  operator  handbooks  and  maintenance  manuals. 

c.  attrition  training  -  the  cost  of  training  new  operators  and  maintenance 
personnel. 

d.  maintenance  material  -  the  cost  of  material  to  repair  the  test  equipment 
when  it  falls. 

e.  maintenance  labor  -  the  cost  of  labor  to  maintain  the  test  equipment. 

f.  operations  labor  -  the  cost  of  employing  the  test. 

Bogard  et  al.  (1980)  find  that  for  external  test  equipment,  operations  labor 
and  software  maintenance,  when  applicable,  tend  to  be  the  dominant  costs. 

Cost  of  testing,  screening  and  detection  of  failure  in  an  LRU  at  the  inter¬ 
mediate  level,  CMD,  can  be  computed  as 

CMD  -  CIX  +  n Q(ct )  (SR)  (TMD)  (NMD)/(UR)  (H) 

where 

TMD  *=  average  time  required  for  testing,  screening  and  detection  of  failure 
in  an  LRU  at  the  intermediate  level 

NMD  -■  Number  of  technicians  required  to  test,  screen  and  detect  a  failure  in 
an  LRU  at  the  intermediate  level 

ct  «  annual  cost  to  provide  a  trained  technician  for  maintenance  (annual 
labor  cost)  at  the  intermediate  level 

SR  *  shop  support  ratio  (total  personnel  at  the  intermediate  level  divided 
by  the  number  of  maintenance  and  operating  personnel) 

UR  ■  manpower  utilization  rate  at  the  intermediate  level 
H  *  number  of  working  hours  per  year  in  the  intermediate  level 


v,v^;v.w 
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The  average  time  required  for  diagnosis  of  the  LRU  failure  at  the  inter¬ 
mediate  level,  TMI ,  will  be  a  direct  function  of  LRU  size.  On  the  average,  one- 
half  of  the  modules  in  the  LRU  will  need  to  be  examined  in  order  to  find  the 
faulty  one.  TMI  is  of  the  form 
TMI  -  1/2(TM)  (M)nQ 

where  TM  *  average  time  to  test  one  module  for  malfunction 
M 

N  i  N 

TM  *  (  I  I  TM..)/  I  M.  ,  and 
i-1  j=l  i-1  1 

TMij  *  time  to  test  module  j  of  LRU^ 

M  =  average  number  of  modules  in  any  LRU 

N 

M  -  (  E  M  )/N 
i-1  1 

and  cost  of  module  isolation  is 

CMI  -  (ct)  (SR)  (TMI)  ( NMI ) / ( UR)  (H) 

where 

NMI  *  number  of  technicians  required  for  isolating  a  failed  module  in  the 
intermediate  level. 

4.1.11  Average  Module  Cost  (CM) 

Average  module  cost  can  be  computed  as  follows : 

M 

N  i  N 

CM  -  (  Z  E  CM  )/  E  M 
i-1  j-1  J  i-1 


H  * 


where  CM, 


cost  of  module  j  in  LRU., 


4.1.12  Module  Removal  and  Replacement  Cost  (CMR) 


Cost  of  removing  and  replacing  modules  including  cost  of  modules,  at  the 
Intermediate  level  can  be  computed  as  follows: 

CMR  -  H  • (CM)  +  (ct)  (SR)  (TMR)  (NMR)/(UR)  (H) 
where  TMR  *  average  time  to  remove  and  replace  the  faulty  module  and  check  out  an 
LRU  at  the  intermediate  level 

M 

N  i  N 

TMR  «  (  E  E  TMR  )/  E  M 
i*l  j*l  iJ  i*l  1 

TMRj^j  *  time  to  remove  and  replace  module  j  of  LRU^ 

NMR  *  number  of  technicians  required  to  remove  and  replace  a  failed  module  at 
the  intermediate  level 

hj.  =  number  of  modules  in  the  Isolated  group  of  modules 

4.1.13  Shipping  Cost  per  Module  Between  the  Intermediate  Level  and  Depot  Level 

<csID) 

Shipping  cost  per  module  between  the  intermediate  level  and  depot  level 
includes  also  packaging  and  handling  costs  and  can  be  expressed  as: 

csID  -  (wm)(cpp1d) 

where  WM  =  average  module  weight 

N  Mi  N 

WM  -  n  (  E  E  WM  )/  E  M 
i-1  j-1  i-1 

WMij  -  weight  of  module  j  of  LRU^ 

CPP1D  -  cost  per  pound  of  transportation  and  packaging  between  the  intermediate 
and  depot  levels. 


4.1.14  Cost  of  Discarding  Any  Module  (CM!) 

Cost  of  discarding  or  throwing  away  any  module  includes  all  the  costs 
Incurred  to  get  rid  of  the  module  or  the  scrap  value  if  it  can  be  sold  (negative 
cost). 


4.1.15  Expected  Cost  Resulting  from  Having  a  Faulty  Spare  Part  at  the  Inter¬ 
mediate  Level  (CFj) 

The  expected  cost  resulting  from  having  a  faulty  module  (coming  from  the 
depot)  as  a  spare  part  at  the  intermediate  level  can  be  computed  using  the  deci¬ 
sion  tree  of  Figure  2.2,  including  all  costs  resulting  from  replacing  any  module 
(faulty  or  good)  by  a  faulty  module  with  costs  of  all  consequences. 

CFj  -  11  -  P(F)i]*P(FA)i»P(HI  )  • [CMD  +  CMI  +  CMR  +  CSQI  +  CF0]  +  P(F)I 
•P(FD)  »P(FI  )2« [CMD  +  CMI  +  CMR  +  CSQI  +  CFq]  . 


4.1.16  Cost  of  Testing  Component/ Part  (CPR) 

Cost  of  testing,  screening  and  detection  of  failure  in  a  module  at  the  depot 


CPD  -  rij.  (  HDC  )  (  TPD)  •  (  NPD) 
where  HDC  =  hourly  depot  time  repair  cost 

TPD  **  time  required  for  testing,  screening  and  detection  of  a  failure  in  a 
module  in  the  depot 

NPD  *  number  of  repair  persons  required  to  test,  screen  and  detect  a  failure 
in  a  module  in  the  depot. 


4.1.17  Cost  of  Isolating  Component/ Part  (CPI) 


Cost  of  isolating  components  or  part6  at  the  depot  is 
CPI  -  nI(HDC)(TPI)(NPI) 

where  TPI  *  average  time  required  for  diagnosis  of  the  module  failure  in  the  depot 
NP1  -  number  of  repair  persons  required  for  isolating  a  failed  module  in  the 
depot. 


4.1.18  Average  Cost  of  Component/Part  (CP) 

Average  cost  of  component /part  can  be  computed  as  follows: 


CP 


N  i  ji  N  i 

(ZEE  CP  )/  1  Z  Un 
i-1  j-1  k=*l  i-1  j-1 


where  CPijk 


cost  of  component/part  k  in  module  j  of  LRU^ 


4.1.19  Cost  of  Removing  and  Replacing  Component /Part  (GPR) 

Cost  of  removing  and  replacing  component /part  in  the  depot  including  cost  of 
component/part  is: 

CPR  -  (HDC)(TPR)  (NPR)  +  (CP)  nD 

HD  -  number  of  the  components/parts  in  the  isolated  group  of 
components/ parts 

TPR  -  average  time  to  remove  and  replace  the  faulty  component  and  check  out 
the  module 
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TPRijk  *  time  to  re®ove  and  replace  component  k  in  the  module  j  of  LRl^ 

U  *  number  of  components  in  module  j  of  LRU^ 

NPR  *  number  of  repair  persons  required  to  remove  and  replace  a  failed  com¬ 
ponent 

4.1.20  Expected  Cost  of  Introducing  a  Good  Module  to  the  Depot  Level  (C^) 

Expected  cost  of  introducing  a  good  module  to  the  depot  can  be  computed  con¬ 
sidering  all  decisions  emanated  from  the  node  concerning  good  RU  (component)  in 
Figure  2.2  as  well  as  the  associated  costs. 

CD  =  [1  -  P(FA)d][(CPD  +  CSID]  +  P(FA)d  [1  -  PCWI^uHCPD  +  CPI  +  CSID] 

+  P(FA)D*P(WIi)D[CPD  +  CPI  +  CPR  +  CPT  +  CSID] 

4.1.21  Expected  Cost  of  Introducing  a  Good  LRU  to  the  Intermediate  Level  (Cj) 
Expected  cost  of  introducing  a  good  LRU  to  the  intermediate  level  can  be 

computed  considering  all  decisions  emanated  from  the  node  concerning  good  RU 
(module)  in  Figure  2.2  as  well  as  the  associated  costs. 

Cj  -  II  -  P(FA)IJ*[CMD  +  CSQI]  +  P(FA)I  [1  -  P(WIi)I][CMD  +  CMI  +  CSQ][] 

+  P(FA)I»P(WIi)I*  [CMD  +  CMI  +  Dj*  (CMT  +  (1  -  Dj)*^] 

4.2  Costs  Associated  with  Testability  at  the  Depot 

In  this  section,  costs  associated  with  testability  at  the  depot  are 
presented.  This  includes  costs  associated  with  a,  $,  y  errors  and  those 

associated  with  a  successful  performance  of  the  diagnostic  system.  Then  these 
costs  are  used  to  develop  a  new  cost-measure  of  effectiveness  at  the  depot. 


4.2.1  Expected  Cost  of  a  Error  (Oa^) 

Costs  associated  with  a  error  are  costs  of  testing.  Isolation,  removal,  and 

replacement  of  component/part  plus  cost  of  throwing  away  a  good  component /part , 
and  returning  a  good  module  to  the  Intermediate  level  as  a  good  spare  part. 

0»D  =  [l-P(F)D]*P(FA)D»P(WIi)D  [CPD  +  CPI  +  CPR  +  CPT  +  CSIE)] 

4.2.2  Expected  Cost  of  0  Error  (CB^) 

Costs  associated  with  8  error  are  presented  in  Figure  4.1, 

C8d  -  P(F)d  [ 1 — P ( FD) D ] [CPD  +  CSID  +  CFj] 

+  P(F)d*  P(FD)d  [I-PCFI^d  -  PCFL^dHCPD  +  CPI  +  CS1D  +  CFj] 

+  P(F)D*P(FD)D*P(FL)i)D  [CPD  +  CPI  +  CPR  +  CPT  +  CSID  +  CFj] 

4.2.3  Expected  Cost  of  Y  Error  (Cy^) 

Costs  associated  with  Y  error  are  these  of  unnecessary  testing,  and  isolation 
of  components  plus  cost  of  shipping  the  good  module  back  to  the  intermediate 
level, 

crD  *  [1-P(F)D]*P(FA)D*[1-P(WI1)D][CPD  +  cpi  +  csID] 

4.2.4  Cost-Measure  of  Effectiveness  at  the  Depot 

Using  the  costs  associated  with  a ,  0 ,  and  Y  errors  at  the  depot  level  and  the 
probability  of  their  occurances,  a  new  cost-measure  of  effectiveness  can  be 
derived  by  computing  the  expected  cost  of  errors  of  the  diagnostic  system  C^, 

CL  - 


Ca„  +  C8„  +  Cy„ 


Shipping  cost  from  depot  to  intermediate 
level 

Expected  cost  of  having  a  faulty  module  as  a 
spare  part  at  the  intermediate  leve 1 _ 


Figure  4.1  Costs  Associated  with 
8  Error  at  the  Depot 


4.2.5  Expected  Costs  of  a  Successful  Performance  of  the  Diagnostic  System  at  the 
Depot  (CSp) 

The  costs  associated  with  a  successful  performance  of  the  diagnostic  system 
include  costs  Involved  when  a  faulty  module  is  introduced  to  the  depot  as 
presented  in  Figure  4.2,  and  those  involved  when  a  good  module  is  introduced  to 
the  depot. 

0SD  -  P(F)D«P(FD)D*P(Fli)D  [CPD  +  CPI  +  CPR  +  CPT  +  CSqj]  + 

*  [1  -  P(F)n] • ( 1  -  P(FA)nJ* [CPD  +  CSTn] 


4.2.6  Expected  Cost  of  Introducing  a  Faulty  Module  to  the  Depot  (CG^) 

The  costs  involved  are  the  costs  associated  with  both  3  error  and  a  success¬ 
ful  performance  of  the  diagnostic  system  when  a  faulty  module  is  introduced  to  the 


depot. 


cgd  -  ceD  +  cdD  -  [i  -  p(f)d]  [l  -  p(fa)d!  [cpd  +  csID] 


4.3  Costs  Associated  with  Testability  at  the  Intermediate  Level 

In  this  section,  costs  associated  with  testability  at  the  intermediate  level 
are  presented.  This  includes  costs  associated  with  a,  3,  Y  errors,  and  successful 
performance  of  the  diagnostic  system.  Then  these  costs  are  used  to  develop  a  new 
cost-measure  of  effectiveness  at  the  intermediate  level. 


4.3.1  Expected  Cost  of  a  Error  (Oa^) 

Costs  associated  with  a  error  are  presented  in  Figure  4.3, 
Cttj  -  [ 1 — P(F ) j 1  * P(FA)I»P(WIi)  [CMD  +  CMI  +  CMR  +  Dj  (CMT)  + 

(1-D,)  (csTn  +  oi 
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4.3.2  Expected  Cost  of  0  Error  (C0^) 


Costs  associated  with  &  error  are  presented  in  Figure  4.4, 

&1  -  P(F)j  •  [1-P(FD)I]  •  (CMD  +  CSQ1  +  CFQ) 

+  P(F)I  •  P(FD)j  [l-PCF^Jj  -  PCF^Jj]  (CMD  +  CMI  +  CSQI  +  CFQ) 

+  P(F) x  •  P(FD) j  •  P(FLi)I  (CMD  +  CMI  +  CMR  +  Dj  (CMT)  +  CS0I 
+  CF0  +  (1-Di)(CSid  +  CD)) 

4.3.3  Expected  Cost  of  Y  Error  (Cy^) 

Costs  associated  with  Y  error  are  these  of  unnecessary  module  testing,  isola¬ 
tion  and  shipping  the  good  LRU  back  to  the  organizational  level. 

CYj.  -  [  1-P(F)I]  •  P(FA)I  •  [ l-P(WIi)D)  [CMD  +  CMI  +  CS0I] 


4.3.4  Cost-Measure  of  Effectiveness  at  the  Intermediate  Level 

Using  the  costs  associated  with  a,  £5,  and  Y  errors  at  the  intermediate  level 
and  the  probability  of  their  occurences,  a  new  cost-measure  of  effectiveness  can 
be  derived  by  computing  the  expected  cost  of  errors  of  the  diagnostic  system  (C^.), 
where 


C 


I 


Cfcj.  +  C&j.  +  CYj. 


4.3.5  Expected  Cost  of  a  Successful  Performance  of  the  External  Test  Equipment  at 
the  Intermediate  Level  (Cij) 

The  costs  associated  with  a  successful  performance  of  the  external  test 
equipments  are  those  involved  when  a  faulty  LRU  is  introduced  to  the  intermediate 
level  as  presented  in  Figure  4.5  and  those  Involved  when  a  good  LRU  is  introduced 
to  the  intermediate  level. 

C&1  -  P(F)r  P(FD)j  P(FIi)I  [ CMD  +  CMI  +  CMR  +  D^CMT)]  +  (1  -  Dj)  • 

(CSID  +  CGd)]  +  [1  -  P(F)j]  [1  -  P(FA)1]  [CMD  +  CSQI] 


One  nodule 
i*  faulty 


Figure  4.4  Costs  Associated  with  8  Error 
at  the  Intermediate  Level 
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Figure  4.5  Costs  Associated  with  Successful  Performance  at 
the  Intermediate  Level 


4.3.6  Expected  Cost  of  Introducing  a  Faulty  Module  to  the  Intermediate  Level 


(CGj) 

The  costs  involved  are  the  costs  associated  with  both  B  error  and  a  success¬ 
ful  performance  of  the  external  test  equipment  at  the  intermediate  level. 

CGX  -  CBj.  +  C6X  -  [1  -  PCF)^  (1  -  P(FA)X J  [CMD  +  CS^] 

4.4  Costs  Associated  with  Testability  at  the  Organizational  Level 

In  this  section,  costs  associated  with  testability  at  the  organizational 
level  are  presented.  This  includes  costs  associated  with  ot,  B,  Y  errors,  and  a 
successful  performance  of  the  BIT/ATE  system.  Then  these  costs  are  used  to 
develop  a  new  cost-measure  of  effectiveness  at  the  intermediate  level  as  well  as 
an  expected  life  cycle  cost  of  the  equipment /system. 


4.4.1  Expected  Cost  of  a  Error  (0«0) 

Costs  associated  with  a  error  are  presented  in  Figure  4.6, 

CotQ  -  [1-P(F)q]  •  P(FA)q  •  PCWI^q  •  {  [Asp  +  V  (1-Asp]  [CLI  +  CLR 

+  CSQI  +  Cj.]  +  (1  -  Asp)(l  -  V)  [CMA  +  CLI  +  CLR  +  CS0I  +  Cj]} 


4.4.2  Expected  Cost  of  B  Error  (CBQ) 

Costs  associated  with  B  error  are  presented  in  Figure  4.7. 

CBq  “  P(F)0*[1  -  P(FD)0]  (1  -  V)  (CMF)  +  P(F)0.P(FD)0.[1  “  PCFI^q 

-  P(FL1)()]  [CLI  +  (1  -  V)(CMF)]  +  P(F)0‘P(FD)0*P(FL1)0»{(1  -  V) 
(CMF)  +  V(1  -  Asp) (CMA  +  CLI  +  CLR  +  CSQI  +  Cj)  +  V(Asp)  [CLI  + 
CLR  +  CS__  +  C. 1) 


A. 4. 3  Expected  Cost  of  Y  Error  (CYQ) 


Costs  associated  with  Y  error  are  only  the  costs  of  the  unnecessary  LRU  iso¬ 
lation  cost  and  costs  of  interruptions  and  confusions  which  might  lead  to  mission 
abortion.  Hence, 

CYQ  -  [1  -  P(F)0]*P(FA)0» [1  -  P(WIi)0]«  [(CMA)'P(MA)  +  CLI] 


A. 4. A  Cost-Measure  of  Effectiveness  at  the  Organizational  Level 

Using  the  costs  associated  with  a ,  3  ,  and  Y  errors  at  the  organizational 
level  and  the  probability  of  their  occurances,  a  new  cost-measure  of  effectiveness 
can  be  derived  by  computing  the  expected  cost  of  the  errors  of  the  diagnostic 
system  (Cg),  where 


Cct„  +  OT  +  Cy  , 


This  measure  represents  the  actual  burden  of  the  imperfection  of  the  diag¬ 
nostic  system,  including  both  probabilities  of  errors  and  costs  resulted  from 
these  errors. 


A. A. 5  Expected  Cost  of  a  Successful  Performance  of  the  BIT/ATE  System  (05^) 

The  costs  associated  with  a  successful  performance  of  the  BIT/ATE  system  are 
those  of  LRU  isolation  costs,  shipping  costs  to  the  intermediate  level,  and  the 
expected  cost  of  introducing  a  faulty  LRU  to  the  intermediate  level  as  shown  in 
Figure  A. 8. 

°60  “  P(F)0P(FD)0P(FIi)0  {(Asp)  [CLI  +  CLR  +  CSQI  +  CGj]  +  (1  -  Asp) 

[(v)  [CLI  +  CLR  +  CS0T  +  CG  ]  +  (1  -  v)(CMF)]} 


An  LRU  is 
faulty 


Figure  4.8:  Costs  Associated  with  Successful  Performance 
at  the  Organizational  Level 
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4.4.6  Life  Cycle  Cost  of  the  Equipnent/System 

Every  time  the  Equipment/System  is  used  the  following  costs  are  involved: 

1.  Cost  of  a  successful  performance  of  the  BIT/ATE  (C6q) 

That  includes  all  the  expected  costs  of  correctly  detecting  and  isolating 
a  failure  and  the  expected  cost  of  introducing  a  faulty  LRU  to  the  inter¬ 
mediate  level  with  other  possibilities  of  introducing  a  good  or  faulty 
module  to  the  depot.  (actual  expected  values  of  repairing  the  system  at 
different  levels) 

2.  Costs  associated  with  a,  8,  and  Y  errors  (Ca^,  C8q,  an<*  CYq) 

That  includes  all  costs  which  arise  from  the  imperfection  of  testability 
of  diagnostic  systems  at  different  levels  with  all  the  consequences  such 
as  mission  abortion,  mission  failure,  introducing  a  good  LRU  to  the 
intermediate  level  as  a  result  of  false  removal,  etc. 

A  very  realistic  figure  of  the  expected  operating  and  maintenance  costs  of 
the  equipment/system  (Cj)  can  be  modeled  by  multiplying  the  above  costs 
(tog  +  C8q  +  CYq  +  C6q)  by  the  number  of  times  the  equipment /system  will  be  in 
operation  during  its  life  time  (number  of  missions  for  example).  This  cost  repre¬ 
sents  all  operation  and  maintenance  costs  and  includes  implicitly  failure  rate  of 
the  equipment  and  time  to  repair  and  costs  resulting  from  the  imperfection  of 
testability  at  different  levels. 

Now,  in  order  to  find  the  expected  life  cycle  costs  of  the  equipment/system, 
the  above  cost  (Cj)  should  be  added  to  all  one  time  costs  such  as  initial  cost  of 
the  equipment,  implementation  cost  of  the  BIT/ATE  system,  technical  manuals, 
attrition  training,  in  addition  to  maintenance  material,  operations  facility 


space,  etc 


4.5  Costs  Associated  with  a,  g,  and  if  Errors  for  the  Composite  Organizational/ 
Intermediate  System 

In  this  section  costs  associated  with  all  types  of  errors  a,  B,  and  Y  for  the 
composite  organizational/intermediate  system  are  presented,  computed,  and  used  to 
develop  a  new  cost-measure  of  effectiveness. 

4.5.1  Expected  Cost  of  a  Error  (Ca^) 

Costs  associated  with  a  error  are  presented  in  Figure  4.9, 

Q*0I  -  [l-P(F)0]*P(FA)0-P(WIi)0*P(FA)I*P(WIi)].  {[Asp  +  V(  1-Asp)  ]  [CLI  + 

CLR  +  CS0I  +  CMD  +  CMI  +  CMR  +  D^CMT)  +  (1-DjXCSjp  +  Cp)]  + 

(1-Asp)  (1-V)  [CMA  +  CLI  +  CLR  +  CS0I  +  CMD  +  CMI  +  CMR  +  Dx  (CMT)  + 
(1-Dj)  (CS1D  +  CD)]} 

4.5.2  Expected  Cost  of  &  Error  (®0j) 

Costs  associated  with  B  error  are  presented  in  Figure  4.10  where 
®0i  -  p(F)0  ’  U-P(FD)0]  (1-V)  (CMF) 

+  P(F)q  •  P(FD)q  [l-P(FIi)0](l-V)(CLI  +  CMF) 

+  P(F)0*  P(FD)q  P(FLi)0  [l-P(FI1)0[l-P(FA)I](V)[(l-Asp)(CMA  +  CLI  +  CLR 
+  2CSQ1  +  CMD)  +  Asp(CLI  +  CLR  +  2CSQI  +  CMD)] 

+P(F)0  •  P(FD)q  •  P(FL.)0  •  P(FA)I[l-P(WIi)I]  (V) [( 1-Asp) (CMA  +  CLI 
+  CLR  +  2CSqi  +  CMD  +  CMI)  +  Asp(CLI  +  CLR  +  2CSQI  +  CMD  +  CMI)] 

+  P(F)q  •  P(FD)q  •  P(FLi)0  •  P(FA)X  •  P(WIi)I  (V) [( 1-Asp) (CMA  +  CLI 
+  CLR  +  CSQI  +  CMD  +  CMI  +  CMR  +  Dj  (CMT)  +  (1-D1)(CSID  +  Cp)) 

+  Asp  (CLI  +  CLR  +  CSQI  +  CMD  +  CMI  +  CMR  +  Dx  (CMT) 

+  (i-di)(csid  +  cD))] 

+  P(F)n  •  P(FD)n  •  P(FI. )0  •  [1-P(FD)T] [CLI  +  CLR  +  2CSQI  +  CMD  +  CF  ] 


OJ 


LRU  isolation  cost 


Cost  of  mission  abortion 


} 


LRU  removal  and  replacement  cost 
Shipping  cost  to  intermediate  level 
Module  testing  cost 

Module  isolation  cost 

Module  removal  and  replacement  cost 


iscard  module^ 


Yes  (Dt) 


I  -  Shipping  cost  to  depol 
-  Expected  cost  of  in¬ 
troducing  a  good 
module  to  depot 


+  P(F)q  •  P(FD)0  •  P(FIi)0  •  P(FD)j  •  [l-P^L^  -  P(FIi)j][CLI  +  CLR 
+  2CSQI  +  CMD  +  CMI  +  CFq] 

+  P(F)0  •  P(FD)q  •  P(FI±)0  •  P(FD)j  •  P(FLi)I  [CLI  +  CLR  +  2CSQI 
+  CMD  +  CMI  +  CMR  +  CF0  +  D][(CMT)  +  l-DjHCSjp  +  CD>] 

4.5.3  Expected  Cost  of  Y  Error  (CTqj) 

Costs  associated  with  y  error  are  presented  in  Figure  4.11  where 
Cyqi  -  [ 1 — P ( F )0 ]  •  P(FA)0  •  [l-P(WIi)0}[CLI  +  (CMA)  •  P(MA))] 

+  [1-P(F)0]  •  P(FA)q  •  P(WI1)0[1-P(FA)I]  •  [(Asp  +  (V)  (1-Asp))  •  (CLI 

+  CLR  +  2CSqi  +  CMD)  +  (1-Asp)  ( 1-V)  (CMA  +  CLI  +  CLR  +  2CSQI  +  CMD)] 

+  [ 1— P ( F )Q ]  •  P(FA)q  •  P(WIi)Q  •  P(FA)j  •  [l-P(WIi)][(Asp 

+  (V)  (1-Asp))  •  (CLI  +  CLR  +  2CS0I  +  CMD  +  CMI) 

+  (1-Asp)  (1-V)  (CMA  +  CLI  +  CLR  +  2CSQI  +  CMD  +  CMI)] 

4.5.4  Cost-Measure  of  Effectiveness  for  the  Composite  Organizational/ Intermediate 
System 

Using  the  costs  associated  with  o,  B,  and  y  errors  for  the  composite  organi¬ 
zational/intermediate  system,  and  the  probability  of  their  occurences  a  new  cost- 
measure  of  effectiveness  can  be  derived  by  computing  the  expected  cost  of  the 
errors  of  the  diagnostic  system  (C0I),  where  CQl  -  CaQI  +  CBQI  +  Cyqi. 

5.  System  Diagnosis  Model 

The  system  diagnosis  can  be  modeled  using  Markov  transition  matrix.  This 
matrix  describes  the  actual  transitions  of  the  system  taking  into  consideration 
the  imperfection  of  the  diagnostic  system  and  all  the  resulting  errors. 


In  general,  the  system  can  be  In  one  of  the  following  states: 

State  0:  Equipment  is  good  (diagnostic  system  does  not  report  any  failure | equip¬ 
ment  is  good) 

State  1:  Equipment  is  faulty  and  failure  is  not  diagnosed  (no  detection  or  isola¬ 
tion  of  failure) 

State  2:  Equipment  is  faulty  and  the  failure  is  correctly  detected 
State  3:  Equipment  is  faulty  and  the  faulty  LRU  is  correctly  isolated 
State  4:  Equipment  is  faulty  and  a  good  LRU  is  mistakenly  isolated  (false  isola¬ 
tion) 

State  5:  Equipment  is  good  but  the  diagnostic  system  reports  a  failure  (false 
alarm) 

State  6:  Equipment  is  good  but  the  diagnostic  system  mistakenly  isolates  a  good 
LRU  (false  report). 

Let 

X  *  failure  rate  of  the  equipment 
X^  *  false  alarm  rate 

Xd  *  rate  of  correct  detection  (given  that  there  is  a  failure) 

X^  «  rate  of  correctly  isolating  the  faulty  LRU  if  the  failure  has  been 
detected 

X  **  rate  of  mistakenly  isolating  a  good  LRU  if  a  fault  is  detected  in  a 

IXr 

faulty  equipment 

u  *  rate  of  removing  and  replacing  LRUs  (rate  of  repair) 

X  «  rate  of  isolating  no  LRUs  after  detecting  a  failure  [equipment  is  faulty 

X^  •  rate  of  isolating  a  good  LRU  after  mistakenly  reporting  a  failure 

(false  alarm) 

*  [rate  of  false  report [false  alarm] 


X  ■  [rate  of  CND| false  alarm] 

■  rate  of  isolating  no  LRUs  after  mistakenly  reporting  a  failure  (false 
alarm) 

5.1  Transition  Probabilities 
At  State  0 

Equipment  is  good  (diagnostic  system  does  not  report  any  failure | equipment  is 
good).  If  the  system  is  in  state  0  at  time  t,  it  can  make  one  of  the  following 
transitions  in  t,  t+dt: 

1  -  a  transition  to  state  1  if  the  equipment  fails  with  probability  Xdt 

2  -  a  transition  to  state  5  if  the  diagnostic  system  reports  a  failure 

(false  alarm)  with  probability  X^dt. 

3  -  remain  in  state  0  if  no  failure  or  false  alarm  occurs  with  probability 

1  -  Xdt  -  Xfdt 

At  State  1 

Equipment  is  faulty  and  failure  is  not  diagnosed  (no  detection  or  isolation 
of  failure).  If  the  system  is  in  state  1  at  time  t,  it  can  make  one  of  the 
following  transitions  in  t,  t+dt: 

1  -  a  transition  to  state  2  if  the  diagnostic  system  detects  the  failure 

with  probability  X  dt 
d 

2  -  remain  in  state  1  if  the  diagnostic  system  does  not  detect  the  failure 

with  probability  1  -  X  dt 

a 

At  State  2 

Equipment  is  faulty  and  the  fault  is  correctly  detected.  If  the  system  is  in 


state  2  at  time  t,  it  can  make  one  of  the  following  transitions  in  t,  t+dt: 


1  -  a  transition  to  state  3  if  the  faulty  LRU  is  correctly  Isolated  with 


probability  A^dt 

2  -  a  transition  to  state  4  if  a  good  LRU  is  mistakenly  isolated  (false 

Isolation)  with  probability  A^dt 

3  -  a  transition  to  state  1  if  no  LRU  is  isolated  with  probability  A^dt 

4  -  remain  in  state  2  if  none  of  the  above  occurs  with  probability  1 


-  Xfldt  -  1  it  -  X^dt 


A t  State  3 


Equipment  is  faulty  and  the  faulty  LRU  is  correctly  isolated.  If  the  system 
is  in  state  3  at  time  t,  it  can  make  one  of  the  following  transitions  in  t,  t+dt: 

1  -  a  transition  to  state  0  if  the  equipment  is  repaired  by  removing  the 

isolated  faulty  LRU  and  replacing  it  with  a  good  one  with  probabil¬ 
ity  udt 

2  -  remain  in  state  3  if  the  repair  is  not  completed  with  probability  1 


-  pdt 


At  State  4 


Equipment  is  faulty  and  a  good  LRU  is  mistakenly  isolated  (false  isola¬ 
tion).  If  the  system  is  in  state  4  at  time  t,  it  can  make  one  of  the  following 
transitions  in  t,  t-tdt: 

1  -  a  transition  to  state  1  if  the  good  LRU  is  removed  and  replaced  by 

another  LRU  (the  equipment  still  faulty)  with  probability  udt 

2  -  remain  in  state  4  if  the  unnecessary  repair  is  not  completed  with 


probability  1  -  ydt 


At  State  5 


Equipment  is  good  but  the  diagnostic  system  reports  a  failure  (false 
alarm).  If  the  system  is  in  state  5  at  time  t,  it  can  make  one  of  the  following 
transitions: 


1  -  a  transition  to  state  1  if  the  equipment  fails  with  probability  Xdt 

2  -  a  transition  to  state  0  if  the  diagnostic  system  does  not  isolate  any 

failure  (CND)  with  probability  X^dt 

3  -  a  transition  to  state  6  if  the  diagnostic  system  isolates  a  good  LRU 

with  probability  X^dt 

4  -  remain  in  state  5  if  none  of  the  above  occurs  with  probability  1  -  Xdt 

-  X  dt  -  X  dt 
c  f  r 

At  State  6 

Equipment  is  good  but  the  diagnostic  system  mistakenly  isolates  a  good  LRU 
(false  report).  If  the  system  is  in  state  6  at  time  t,  it  can  make  one  of  the 
following  transitions: 

1  -  a  transition  to  state  0  if  the  isolated  LRU  is  removed  and  replaced  by 

another  good  LRU  with  probability  ydt 

2  -  remain  in  state  6  if  the  unnecessary  repair  is  not  completed,  with 

probability  1  -  pdt 

Therefore,  the  Markov  transition  matrix  is 


0  /l-X-Xj 


1-Afi"Xf*-Xni 


1-X-X  -X, 
c  fr 


and  the  corresponding  Markov  transition  diagram  is  presented  in  Figure  5.1. 


Using  the  above  model,  many  properties  of  the  system  can  be  determined.  Some 
of  them  are : 

1)  P^t)  =  probability  that  the  system  can  be  in  any  state  i  at  time  t 

2)  steady  state  probability,  which  is  the  probability  that  the  system  can  be 

in  any  state  i  in  the  long  run,  or  the  proportion  of  time  the  system  spends  in 
each  state. 

3)  Availability  AO®)  ■  steady  state  availability  -  EP. ,  j  =  all  acceptable 

j  3 

states 

4)  Reliability  function  at  any  time  t,  r( t )  =  EP.(t),  j  =  all  acceptable  states 

j  3 

[in  case  the  failure  state  is  an  absorbing  state] 


5)  Mean  time  to  Failure  (MTTF) 

6)  Maintainability  by  using  the  mean  recurrence  time  (length  of  time  to  return  to 
an  acceptable  state  from  a  failed  state) 

All  the  above  properties  of  the  Markov  Process  can  be  very  valuable  tools  in 
the  evaluation  of  the  actual  performance  of  testability  considering  the  imperfec¬ 
tions  of  the  diagnostic  systems.  Currently  the  work  is  progressed  in  this  direc¬ 
tion  to  study  thoroughly  all  these  properties  and  tie  up  the  states  of  the  system 
together  in  a  dynamic  fashion  with  the  costs  associated  with  all  errors  of  test¬ 
ability.  This  approach,  hopefully,  will  lead  to  a  very  efficient  way  to  determine 
and  evaluate  different  testability  policies. 

6.  Summary  and  Conclusions 

6.1  Summary 

A  multi-level  maintenance  tier  testability  evaluation  model  is  developed. 
This  model  evaluates  analytically  the  testability  parameters  at  the  organization¬ 
al,  intermediate,  and  the  depot  levels.  In  addition,  it  describes  three  measures 
of  effectiveness  of  the  performance  of  the  multi-level  testability  systems  taking 
into  account  the  imperfections  of  the  diagnostic  system  (false  alarms.  Can  Not 
Duplicate,  and  Retest  Okay). 

Furthermore,  all  costs  associated  with  the  errors  of  the  diagnostic  system 
are  developed  and  modeled  to  express  the  effectiveness  of  the  diagnostic  system. 
These  costs  are  also  used  to  predict  the  life  cycle  cost  for  the  equipment  taking 
into  account  the  actual  performance  of  the  diagnostic  system  and  the  resulting 


6.2  Conclusions 


Several  conclusions  can  be  drawn  from  this  research  regarding  the  performance 

models  of  testabilities.  They  are: 

1.  Testability  parameters,  at  different  levels,  are  analyzed  and  evaluated  in 
order  to  accurately  represent  the  actual  performance  of  diagnostic  systems  and 
measure  the  reliability  of  the  system  in  accordance  with  a  multi-level 
maintenance  plan.  In  addition,  analytical  models  are  developed  to  compute 
these  testability  parameters. 

2.  A  multi-parameter  testability  evaluation  model  is  developed.  It  contains  all 
levels  of  maintainability  parameters  at  the  organizational,  intermediate,  and 
depot  levels.  This  model  Is  based  on  three  measures  of  effectiveness  which 
show  the  real  accuracy  and  precision  of  the  diagnostic  system  and  cover  all 
imperfections  of  the  diagnostic  system  such  as  false  alarm,  incorrect  isola¬ 
tion,  failure  to  detect,  CND,  etc.  These  measures  could  be  used  as  an 
efficient  tool  to  assess  and  evaluate  the  performance  of  any  diagnostic  system 
at  one  or  more  levels  of  repair.  In  addition,  costs  of  testability  and  Its 
recourse  are  impeded  In  the  evaluation  model.  Considering  costs  in  the  model 
is  utilized  to  find  the  life  cycle  cost  of  any  system  which  considers  the 
costs  of  the  imperfection  of  the  diagnostic  system  as  well  as  the  costs  of  the 
resulting  consequences  like  mission  abortion  or  mission  failure. 

3.  System  availability,  reliability,  and  maintainability  could  be  assessed  more 
accurately  by  including  false  alarm  and  other  imperfections  of  the  diagnostic 
system. 

4.  For  future  works,  it  is  suggested  to  utilize  the  Markov  transition  matrix  to 
investigate  the  interaction  between  different  states  of  the  system  in  a 
dynamic  fashion.  Availability,  reliability,  and  maintainability  of  the  ssytem 
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